PeterMcCluskey comments on Inner and outer alignment decompose one hard problem into two extremely hard problems

PeterMcCluskey 11 Jan 2024 0:36 UTC
15 points
−6
This post is one of the best available explanations of what has been wrong with the approach used by Eliezer and people associated with him.

I had a pretty favorable recollection of the post from when I first read it. Rereading it convinced me that I still managed to underestimate it.

In my first pass at reviewing posts from 2022, I had some trouble deciding which post best explained shard theory. Now that I’ve reread this post during my second pass, I’ve decided this is the most important shard theory post. Not because it explains shard theory best, but because it explains what important implications shard theory has for alignment research.

I keep being tempted to think that the first human-level AGIs will be utility maximizers. This post reminds me that maximization is perilous. So we ought to wait until we’ve brought greater-than-human wisdom to bear on deciding what to maximize before attempting to implement an entity that maximizes a utility function.