‘Doing it well’ seems to be very load bearing there. I think you’re sneaking in an ‘all’ in the background? Like, in order to be defined as superintelligent it must do better at all domains than X or something?
My current answer is something hand wavy about the process just trying to ungoodhart itself (assuming that the self and world model as given start off goodharted) and the chips fall where they may.
It’s not really about doing well/better in all domains, it’s about being able to explain how you can do well at all of the things you do, even if that isn’t nearly everything. And making that explanation complete enough to be convincing, as an argument about the real world assessed using your usual standards, while still keeping it limited enough to avoid self-reference problems.
‘Doing it well’ seems to be very load bearing there. I think you’re sneaking in an ‘all’ in the background? Like, in order to be defined as superintelligent it must do better at all domains than X or something?
My current answer is something hand wavy about the process just trying to ungoodhart itself (assuming that the self and world model as given start off goodharted) and the chips fall where they may.
It’s not really about doing well/better in all domains, it’s about being able to explain how you can do well at all of the things you do, even if that isn’t nearly everything. And making that explanation complete enough to be convincing, as an argument about the real world assessed using your usual standards, while still keeping it limited enough to avoid self-reference problems.