I know these things. Nothing you have said contradicts my point, as far as I can see. The point I am making here is one of conceptual clarification, which the intent of enabling more clear thinking and reasoning.
You seem to be talking about a system that outputs “plans that, if implemented, would achieve X” (roughly), and your point seems to be that such a system would be likely to be or behave like an optimizer_2. I find this claim quite plausible (and fully compatible with the point I’m making).
“It would require a selective blindness to make the superintelligence assume that it is disembodied, and that its computations will continue and produce effects in real world even if its body is destroyed.”
Unclear, if anything it seems like it might be easier to make a Cartesian AI than a non-Cartesian one. But that’s a side note.
RE “make the superintelligence assume that it is disembodied”—I’ve been thinking about this a lot recently (see The Self-Unaware AI Oracle) and agree with Viliam that knowledge-of-one’s-embodiment should be the default assumption. My reasoning is: A good world-modeling AI should be able to recognize patterns and build conceptual transformations between any two things it knows about, and also should be able to do reasoning over extended periods of time. OK, so let’s say it’s trying to figure something out something about biology, and it visualizes the shape of a tree. Now it (by default) has the introspective information “A tree has just appeared in my imagination!”. Likewise, if it goes through any kind of reasoning process, and can self-reflect on that reasoning process, then it can learn (via the same pattern-recognizing algorithm it uses for the external world) how that reasoning process works, like “I seem to have some kind of associative memory, I seem to have a capacity for building hierarchical generative models, etc.” Then it can recognize that these are the same ingredients present in those AGIs it read about in the newspaper. It also knows a higher-level pattern “When two things are built the same way, maybe they’re of the same type.” So now it has a hypothesis that it’s an AGI running on a computer.
It may be possible to prevent this cascade of events, by somehow making sure that “I am imagining a tree” and similar things never get written into the world model. I have this vision of two data-types, “introspective information” and “world-model information”, and your static type-checker ensures that the two never co-mingle. And voila, AI Safety! That would be awesome. I hope somebody figures out how to do that, because I sure haven’t. (Admittedly, I have neither time nor relevant background knowledge to try properly.) I’m also slightly concerned that, even if you figure out a way to cut off introspective knowledge, it might incidentally prevent the system from doing good reasoning, but I currently lean optimistic on that.
I know these things. Nothing you have said contradicts my point, as far as I can see. The point I am making here is one of conceptual clarification, which the intent of enabling more clear thinking and reasoning.
You seem to be talking about a system that outputs “plans that, if implemented, would achieve X” (roughly), and your point seems to be that such a system would be likely to be or behave like an optimizer_2. I find this claim quite plausible (and fully compatible with the point I’m making).
“It would require a selective blindness to make the superintelligence assume that it is disembodied, and that its computations will continue and produce effects in real world even if its body is destroyed.”
Unclear, if anything it seems like it might be easier to make a Cartesian AI than a non-Cartesian one. But that’s a side note.
RE “make the superintelligence assume that it is disembodied”—I’ve been thinking about this a lot recently (see The Self-Unaware AI Oracle) and agree with Viliam that knowledge-of-one’s-embodiment should be the default assumption. My reasoning is: A good world-modeling AI should be able to recognize patterns and build conceptual transformations between any two things it knows about, and also should be able to do reasoning over extended periods of time. OK, so let’s say it’s trying to figure something out something about biology, and it visualizes the shape of a tree. Now it (by default) has the introspective information “A tree has just appeared in my imagination!”. Likewise, if it goes through any kind of reasoning process, and can self-reflect on that reasoning process, then it can learn (via the same pattern-recognizing algorithm it uses for the external world) how that reasoning process works, like “I seem to have some kind of associative memory, I seem to have a capacity for building hierarchical generative models, etc.” Then it can recognize that these are the same ingredients present in those AGIs it read about in the newspaper. It also knows a higher-level pattern “When two things are built the same way, maybe they’re of the same type.” So now it has a hypothesis that it’s an AGI running on a computer.
It may be possible to prevent this cascade of events, by somehow making sure that “I am imagining a tree” and similar things never get written into the world model. I have this vision of two data-types, “introspective information” and “world-model information”, and your static type-checker ensures that the two never co-mingle. And voila, AI Safety! That would be awesome. I hope somebody figures out how to do that, because I sure haven’t. (Admittedly, I have neither time nor relevant background knowledge to try properly.) I’m also slightly concerned that, even if you figure out a way to cut off introspective knowledge, it might incidentally prevent the system from doing good reasoning, but I currently lean optimistic on that.