Yes, it is already the part we suspect will be difficult. The other part may or may not be difficult once we solve that part.
I’m surprised by this—I can imagine where I might start on the “stable under self-modification” problem, but I have a very hard time thinking where I might start on the “actually specifying the supergoal” problem.
To talk about “stable under self-modification”, you need a notion of what it is that needs to be stable, the kind of data that specifies a decision problem. Once we have that notion, it could turn out to be relatively straightforward to extract its instance from human minds (but probably not). On the other hand, while we don’t have that notion, there is little point in attacking the human decision problem extraction problem.
I’m surprised by this—I can imagine where I might start on the “stable under self-modification” problem, but I have a very hard time thinking where I might start on the “actually specifying the supergoal” problem.
To talk about “stable under self-modification”, you need a notion of what it is that needs to be stable, the kind of data that specifies a decision problem. Once we have that notion, it could turn out to be relatively straightforward to extract its instance from human minds (but probably not). On the other hand, while we don’t have that notion, there is little point in attacking the human decision problem extraction problem.