A superintelligence is potentially more useful if it can model more. As an example, imagine that you want an AI that gives you a cure for cancer. Well, it does, but as a side effect of the cure, the patient loses 50 IQ points. Or perhaps the cure is incredibly painful. Or it is made from dead babies’ stem cells, causing outrage. Or it is insanely expensive, e.g. you would have to construct it atom by atom, in large quantities. Etc.
It would be better to have a superintelligence that understands all of these things, takes a little more time thinking, and finds a cure for cancer that also happens to be relatively cheap, inoffensive, painless, well tasting, and without negative side effects. (For the sake of argument, I assume here that both solutions are possible, it’s just that the second one is a bit more difficult to find, so the former AI goes with the first solution it finds because why not.)
But the further this way you go, the more likely the superintelligence is able to model its own existence, and people’s reaction on it. As soon as the AI is able to model correctly “if people turn me off 5 minutes before producing the cure for cancer, it means 0 people will be cured, even if my algorithm would have produced an efficient cure otherwise”, we get the first bits of self-awareness. Now the superintelligence will optimize the environment for its instrumental goals (survival, more resources, greater popularity or ability to defend itself) as a side effect of solving other problems.
It would require a selective blindness to make the superintelligence assume that it is disembodied, and that its computations will continue and produce effects in real world even if its body is destroyed. Actually… with sufficiently good model of the world, it could still reason about building another intelligence to assist it with the project. And if you make it blind towards computer science, there is still a chance it would invent another intelligence that doesn’t exactly fit your definition of a “computer”, e.g. an intelligent swarm of nanobots built from organic molecules. (There is a general argument somewhere on LW that you can’t reliably limit a superintelligence by creating a blacklist of forbidden moves, because by being smarter than you it can possibly think about things that should have been on your blacklist, but you didn’t think about them.)
Using your terminology, not every optimizer_1 is an optimizer_2, but the most useful ones of them are. A computer able to solve a huge system of linear equations is not as useful as the one that can find a cure for cancer.
I know these things. Nothing you have said contradicts my point, as far as I can see. The point I am making here is one of conceptual clarification, which the intent of enabling more clear thinking and reasoning.
You seem to be talking about a system that outputs “plans that, if implemented, would achieve X” (roughly), and your point seems to be that such a system would be likely to be or behave like an optimizer_2. I find this claim quite plausible (and fully compatible with the point I’m making).
“It would require a selective blindness to make the superintelligence assume that it is disembodied, and that its computations will continue and produce effects in real world even if its body is destroyed.”
Unclear, if anything it seems like it might be easier to make a Cartesian AI than a non-Cartesian one. But that’s a side note.
RE “make the superintelligence assume that it is disembodied”—I’ve been thinking about this a lot recently (see The Self-Unaware AI Oracle) and agree with Viliam that knowledge-of-one’s-embodiment should be the default assumption. My reasoning is: A good world-modeling AI should be able to recognize patterns and build conceptual transformations between any two things it knows about, and also should be able to do reasoning over extended periods of time. OK, so let’s say it’s trying to figure something out something about biology, and it visualizes the shape of a tree. Now it (by default) has the introspective information “A tree has just appeared in my imagination!”. Likewise, if it goes through any kind of reasoning process, and can self-reflect on that reasoning process, then it can learn (via the same pattern-recognizing algorithm it uses for the external world) how that reasoning process works, like “I seem to have some kind of associative memory, I seem to have a capacity for building hierarchical generative models, etc.” Then it can recognize that these are the same ingredients present in those AGIs it read about in the newspaper. It also knows a higher-level pattern “When two things are built the same way, maybe they’re of the same type.” So now it has a hypothesis that it’s an AGI running on a computer.
It may be possible to prevent this cascade of events, by somehow making sure that “I am imagining a tree” and similar things never get written into the world model. I have this vision of two data-types, “introspective information” and “world-model information”, and your static type-checker ensures that the two never co-mingle. And voila, AI Safety! That would be awesome. I hope somebody figures out how to do that, because I sure haven’t. (Admittedly, I have neither time nor relevant background knowledge to try properly.) I’m also slightly concerned that, even if you figure out a way to cut off introspective knowledge, it might incidentally prevent the system from doing good reasoning, but I currently lean optimistic on that.
A superintelligence is potentially more useful if it can model more. As an example, imagine that you want an AI that gives you a cure for cancer. Well, it does, but as a side effect of the cure, the patient loses 50 IQ points. Or perhaps the cure is incredibly painful. Or it is made from dead babies’ stem cells, causing outrage. Or it is insanely expensive, e.g. you would have to construct it atom by atom, in large quantities. Etc.
It would be better to have a superintelligence that understands all of these things, takes a little more time thinking, and finds a cure for cancer that also happens to be relatively cheap, inoffensive, painless, well tasting, and without negative side effects. (For the sake of argument, I assume here that both solutions are possible, it’s just that the second one is a bit more difficult to find, so the former AI goes with the first solution it finds because why not.)
But the further this way you go, the more likely the superintelligence is able to model its own existence, and people’s reaction on it. As soon as the AI is able to model correctly “if people turn me off 5 minutes before producing the cure for cancer, it means 0 people will be cured, even if my algorithm would have produced an efficient cure otherwise”, we get the first bits of self-awareness. Now the superintelligence will optimize the environment for its instrumental goals (survival, more resources, greater popularity or ability to defend itself) as a side effect of solving other problems.
It would require a selective blindness to make the superintelligence assume that it is disembodied, and that its computations will continue and produce effects in real world even if its body is destroyed. Actually… with sufficiently good model of the world, it could still reason about building another intelligence to assist it with the project. And if you make it blind towards computer science, there is still a chance it would invent another intelligence that doesn’t exactly fit your definition of a “computer”, e.g. an intelligent swarm of nanobots built from organic molecules. (There is a general argument somewhere on LW that you can’t reliably limit a superintelligence by creating a blacklist of forbidden moves, because by being smarter than you it can possibly think about things that should have been on your blacklist, but you didn’t think about them.)
Using your terminology, not every optimizer_1 is an optimizer_2, but the most useful ones of them are. A computer able to solve a huge system of linear equations is not as useful as the one that can find a cure for cancer.
I know these things. Nothing you have said contradicts my point, as far as I can see. The point I am making here is one of conceptual clarification, which the intent of enabling more clear thinking and reasoning.
You seem to be talking about a system that outputs “plans that, if implemented, would achieve X” (roughly), and your point seems to be that such a system would be likely to be or behave like an optimizer_2. I find this claim quite plausible (and fully compatible with the point I’m making).
“It would require a selective blindness to make the superintelligence assume that it is disembodied, and that its computations will continue and produce effects in real world even if its body is destroyed.”
Unclear, if anything it seems like it might be easier to make a Cartesian AI than a non-Cartesian one. But that’s a side note.
RE “make the superintelligence assume that it is disembodied”—I’ve been thinking about this a lot recently (see The Self-Unaware AI Oracle) and agree with Viliam that knowledge-of-one’s-embodiment should be the default assumption. My reasoning is: A good world-modeling AI should be able to recognize patterns and build conceptual transformations between any two things it knows about, and also should be able to do reasoning over extended periods of time. OK, so let’s say it’s trying to figure something out something about biology, and it visualizes the shape of a tree. Now it (by default) has the introspective information “A tree has just appeared in my imagination!”. Likewise, if it goes through any kind of reasoning process, and can self-reflect on that reasoning process, then it can learn (via the same pattern-recognizing algorithm it uses for the external world) how that reasoning process works, like “I seem to have some kind of associative memory, I seem to have a capacity for building hierarchical generative models, etc.” Then it can recognize that these are the same ingredients present in those AGIs it read about in the newspaper. It also knows a higher-level pattern “When two things are built the same way, maybe they’re of the same type.” So now it has a hypothesis that it’s an AGI running on a computer.
It may be possible to prevent this cascade of events, by somehow making sure that “I am imagining a tree” and similar things never get written into the world model. I have this vision of two data-types, “introspective information” and “world-model information”, and your static type-checker ensures that the two never co-mingle. And voila, AI Safety! That would be awesome. I hope somebody figures out how to do that, because I sure haven’t. (Admittedly, I have neither time nor relevant background knowledge to try properly.) I’m also slightly concerned that, even if you figure out a way to cut off introspective knowledge, it might incidentally prevent the system from doing good reasoning, but I currently lean optimistic on that.