RE “make the superintelligence assume that it is disembodied”—I’ve been thinking about this a lot recently (see The Self-Unaware AI Oracle) and agree with Viliam that knowledge-of-one’s-embodiment should be the default assumption. My reasoning is: A good world-modeling AI should be able to recognize patterns and build conceptual transformations between any two things it knows about, and also should be able to do reasoning over extended periods of time. OK, so let’s say it’s trying to figure something out something about biology, and it visualizes the shape of a tree. Now it (by default) has the introspective information “A tree has just appeared in my imagination!”. Likewise, if it goes through any kind of reasoning process, and can self-reflect on that reasoning process, then it can learn (via the same pattern-recognizing algorithm it uses for the external world) how that reasoning process works, like “I seem to have some kind of associative memory, I seem to have a capacity for building hierarchical generative models, etc.” Then it can recognize that these are the same ingredients present in those AGIs it read about in the newspaper. It also knows a higher-level pattern “When two things are built the same way, maybe they’re of the same type.” So now it has a hypothesis that it’s an AGI running on a computer.
It may be possible to prevent this cascade of events, by somehow making sure that “I am imagining a tree” and similar things never get written into the world model. I have this vision of two data-types, “introspective information” and “world-model information”, and your static type-checker ensures that the two never co-mingle. And voila, AI Safety! That would be awesome. I hope somebody figures out how to do that, because I sure haven’t. (Admittedly, I have neither time nor relevant background knowledge to try properly.) I’m also slightly concerned that, even if you figure out a way to cut off introspective knowledge, it might incidentally prevent the system from doing good reasoning, but I currently lean optimistic on that.
RE “make the superintelligence assume that it is disembodied”—I’ve been thinking about this a lot recently (see The Self-Unaware AI Oracle) and agree with Viliam that knowledge-of-one’s-embodiment should be the default assumption. My reasoning is: A good world-modeling AI should be able to recognize patterns and build conceptual transformations between any two things it knows about, and also should be able to do reasoning over extended periods of time. OK, so let’s say it’s trying to figure something out something about biology, and it visualizes the shape of a tree. Now it (by default) has the introspective information “A tree has just appeared in my imagination!”. Likewise, if it goes through any kind of reasoning process, and can self-reflect on that reasoning process, then it can learn (via the same pattern-recognizing algorithm it uses for the external world) how that reasoning process works, like “I seem to have some kind of associative memory, I seem to have a capacity for building hierarchical generative models, etc.” Then it can recognize that these are the same ingredients present in those AGIs it read about in the newspaper. It also knows a higher-level pattern “When two things are built the same way, maybe they’re of the same type.” So now it has a hypothesis that it’s an AGI running on a computer.
It may be possible to prevent this cascade of events, by somehow making sure that “I am imagining a tree” and similar things never get written into the world model. I have this vision of two data-types, “introspective information” and “world-model information”, and your static type-checker ensures that the two never co-mingle. And voila, AI Safety! That would be awesome. I hope somebody figures out how to do that, because I sure haven’t. (Admittedly, I have neither time nor relevant background knowledge to try properly.) I’m also slightly concerned that, even if you figure out a way to cut off introspective knowledge, it might incidentally prevent the system from doing good reasoning, but I currently lean optimistic on that.