That was in one of the links, whatever’s decided after thinking carefully for a very long time, less evilly by a living civilization and not an individual person. But my point is that this does not answer the question of what values one should directly align an AGI with, since this is not a tractable optimization target. And any other optimization target or its approximation that’s tractable is even worse if given to hard optimization. So the role of values that an AGI should be aligned with is played by things people want, the current approximations to that target, optimized-for softly, in a way that avoids goodhart’s curse, but keeps an eye on that eventual target.
That was in one of the links, whatever’s decided after thinking carefully for a very long time, less evilly by a living civilization and not an individual person.
That was in one of the links, whatever’s decided after thinking carefully for a very long time, less evilly by a living civilization and not an individual person. But my point is that this does not answer the question of what values one should directly align an AGI with, since this is not a tractable optimization target. And any other optimization target or its approximation that’s tractable is even worse if given to hard optimization. So the role of values that an AGI should be aligned with is played by things people want, the current approximations to that target, optimized-for softly, in a way that avoids goodhart’s curse, but keeps an eye on that eventual target.
Got it, thanks.