Davidmanheim comments on Announcement: AI alignment prize winners and next round

Davidmanheim 27 Mar 2018 13:25 UTC
3 points
I worked with Scott to formalize some of his earlier blog post here; https://arxiv.org/abs/1803.04585 - and wrote a bit more about AI-specific concerns relating to the first three forms in this new lesswrong post: https://www.lesserwrong.com/posts/iK2F9QDZvwWinsBYB/non-adversarial-goodhart-and-ai-risks
The blog post discussion was not included in the paper both because agreement on these points proved difficult, and because I wanted the paper to be relevant more widely than only for AI risk. The paper was intended to expand thinking about Goodhart-like phenomena to address what I initially saw as a confusion about causal and adversarial Goodhart, and to allow a further paper on adversarial cases I’ve been contemplating for a couple years, and am actively working on again. I was hoping to get the second paper, on Adversarial Goodhart and sufficient metrics, done in time for the prize, but since I did not, I’ll nominate the arxiv paper and the blog post, and I will try to get the sequel blog post and maybe even the paper done in time for round three, if there is one.
- cousin_it 28 Mar 2018 15:27 UTC
  2 points
  Parent
  Accepted! Can you give your email address?
  - Davidmanheim 2 Apr 2018 12:26 UTC
    1 point
    Parent
    My username @ gmail