TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 10 Oct 2022 22:23 UTC
LW: 4 AF: 4
AF
Even on the view you advocate here (where some kind of perfection is required), “perfectly align part of the motivations” seems substantially easier than “perfectly align all of the AI’s optimization so it isn’t optimizing for anything you don’t want.”
If we try to weaken it to e.g. a bunch of shards which each imperfectly capture different aspects of human values, with different imperfections, then there’s possibly changes which Goodhart all of the shards simultaneously. Indeed, I’d expect that to be a pretty strong default outcome.
I feel significantly less confident about this, and am still working out the degree to which Goodhart seems hard, and in what contours, on my current view.
What links here?
- TurnTrout's comment on Why The Focus on Expected Utility Maximisers? by DragonGod (29 Dec 2022 0:29 UTC; 2 points)