I don’t think that I’m assuming the existence of some sort of Cartesian boundary, and the distinction between these two senses of “optimizer” does not seem to disappear if you think of a computer as an embedded, causal structure. Could you state more precisely why you think that this is a Cartesian distinction?
Let’s say we have something you consider an optimizer_1, a SAT solver. It operates over a set of variables V arranged in predicts P using an algorithm A. Since this is a real SAT solver that is computed rather than a purely mathematical one we think about, it runs on some computer C and thus for each of V, P, and A there is some C(V), C(P), and C(A) that is the manifestation of each on the computer. We can conceptualize what C does to V, P, and A in different ways: it turns them into bytes, it turns A into instructions, it uses C(A) to operate on C(V) and C(P) to produce a solution for V and P.
Now the intention is that the algorithm A is an optimizer_1 that only operates on V and P, but in fact A is never run, properly speaking, C(A) is, and we can only say A is run to the extent C(A) does something to reality that we can set up an isomorphism to A with. So C(A) is only an optimizer_1 to the extent the isomorphism holds and it is, as you defined optimizer_1, “solving a computational optimization problem”. But properly speaking C(A) doesn’t “know” it’s an algorithm: it’s just matter arranged in a way that is isomorphic, via some transformation, to A.
So what is C(A) doing then to produce a solution? Well, I’d say it “optimizes its environment”, that is literally the matter and its configuration that it is in contact with, so it’s an optimizer_2.
You might object that there’s something special going on here such that C(A) is still an optimizer_1 because it was set up in a way that isolates it from the broader environment so it stays within the isomorphism, but that’s not a matter of classification, that’s an engineering problem of making an optimizer_2 behave as if it were an optimizer_1. And a large chunk of AI safety (mostly boxing) is dealing with ways in which, even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict.
Put pithily, there’s no free lunch when it comes to isomorphisms that allow you to manifest your algorithms to compute them, so you have to worry about the way they are computed.
I have already (sort of) addressed this point at the bottom of the post. There is a perspective from which any optimizer_1 can (kind of) be thought of as an optimizer_2, but its unclear how informative this is. It is certainly at least misleading in many cases. Whether or not the distinction is “leaky” in a given case is something that should be carefully examined, not something that should be glossed over.
I also agree with what ofer said.
“even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict”
I agree. Part of the reason why it’s valuable to make the distinction is to enable more clear thinking about these sorts of issues.
I think there is only a question of how leaky, but it is always non-zero amounts of leaky, which is the reason Bostrom and others are concerned about it for all optimizers and don’t bother to make this distinction.
“The quarks in this box implement a Turing Machine that [performs well on the formal optimization problem P and does not do any other interesting stuff]. And the quarks do not do any other interesting stuff.”
(which of course does not imply that the box is safe)
Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as “optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now”, but if it becomes conflated with “and if something is an optimizer_1 I don’t have to worry about the way it is also an optimizer_2″ then that’s dangerous.
The author of the post suggests it’s a problem that “some arguments related to AI safety that seem to conflate these two concepts”. I’d say they don’t conflate them, but understand that every optimizer_1 is an optimizer_2.
Maybe we’re just not using the same definitions, but according to the definitions in the OP as I understand them, a box might indeed contain an arbitrarily strong optimizer_1 while not containing an optimizer_2.
For example, suppose the box contains an arbitrarily large computer that runs a brute-force search for some formal optimization problem. [EDIT: for some optimization problems, the evaluation of a solution might result in the execution of an optimizer_2]
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?
I would answer this the same way I did earlier in this thread, simply substituting in whatever problem you like for SAT in that example.
All feedback is feedback about the real world because the real world is the only place you can instantiate the computation to reason about “math land”.
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.
Well, one thing a powerful optimizer might do at some point is ask itself “what programm should I run that will figure out such and such for me”. This is what Bostrom is describing in the quote, an optimizer optimizing its own search process. Now, if the AI then searches through the space of possible programms, predicts which one will give it the answer quickest, and then implements it, heres a thing that might happen: There might be a programm that, when ran, affects the outside world in such a way as to speed up the process of answering.
For example, it might lead electricity to run through the computer in such a way as to cause it to emit electromagnetic waves, through which it sends a message to a nearby w-lan router and the uses the internet to hack a bank account to buy extra hardware and have it delivered to and pluged into itself, and the it runs a programm calculating the answer on this much more powerful hardware, and in this way ends up having the answer faster then if it had just started calculating away on the weaker hardware.
And if the optimizer works as described above, it will implement that programm, and thereby optimize its enviroment. Notably, it will optimize for solving the original optimisation problem faster/better, not try to implement the solution to it it has found.
I dont think this makes your distinction useless, as there are genuine system_1 optimizers, even relatively powerful ones, but the Cartesian boundary is an issue once we talk about self-improving AI.
The fact that a superintelligent AI contains an optimization algorithm does not necessarily mean that this optimization algorithm is itself superintelligent (or that it has access to the world model of the overall system, etc). It might, it might not – it depends on the design of the system.
”the Cartesian boundary is an issue once we talk about self-improving AI.” This presumably depends on a lot of specific facts about how the system is designed.
(or that it has access to the world model of the overall system, etc)
It doesnt need to. The “inner” programm could also use its hardware as quasi-sense organs and figure out a world model of its own.
Of course this does depend on the design of the system. In the example described, you could, rather then optimize for speed itself, have a fixed function that estimates speed (like what we do in complexity theory) and then optimize for *that*, and that would get rid of the leak in question.
The point I think Bostrom is making is that contrary to intuition, just building the epistemic part of an AI and not telling it to enact the solution it found doesnt guarantee you dont get an optimizer_2.
I don’t think that I’m assuming the existence of some sort of Cartesian boundary, and the distinction between these two senses of “optimizer” does not seem to disappear if you think of a computer as an embedded, causal structure. Could you state more precisely why you think that this is a Cartesian distinction?
Sure, let’s be super specific about it.
Let’s say we have something you consider an optimizer_1, a SAT solver. It operates over a set of variables V arranged in predicts P using an algorithm A. Since this is a real SAT solver that is computed rather than a purely mathematical one we think about, it runs on some computer C and thus for each of V, P, and A there is some C(V), C(P), and C(A) that is the manifestation of each on the computer. We can conceptualize what C does to V, P, and A in different ways: it turns them into bytes, it turns A into instructions, it uses C(A) to operate on C(V) and C(P) to produce a solution for V and P.
Now the intention is that the algorithm A is an optimizer_1 that only operates on V and P, but in fact A is never run, properly speaking, C(A) is, and we can only say A is run to the extent C(A) does something to reality that we can set up an isomorphism to A with. So C(A) is only an optimizer_1 to the extent the isomorphism holds and it is, as you defined optimizer_1, “solving a computational optimization problem”. But properly speaking C(A) doesn’t “know” it’s an algorithm: it’s just matter arranged in a way that is isomorphic, via some transformation, to A.
So what is C(A) doing then to produce a solution? Well, I’d say it “optimizes its environment”, that is literally the matter and its configuration that it is in contact with, so it’s an optimizer_2.
You might object that there’s something special going on here such that C(A) is still an optimizer_1 because it was set up in a way that isolates it from the broader environment so it stays within the isomorphism, but that’s not a matter of classification, that’s an engineering problem of making an optimizer_2 behave as if it were an optimizer_1. And a large chunk of AI safety (mostly boxing) is dealing with ways in which, even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict.
Put pithily, there’s no free lunch when it comes to isomorphisms that allow you to manifest your algorithms to compute them, so you have to worry about the way they are computed.
I have already (sort of) addressed this point at the bottom of the post. There is a perspective from which any optimizer_1 can (kind of) be thought of as an optimizer_2, but its unclear how informative this is. It is certainly at least misleading in many cases. Whether or not the distinction is “leaky” in a given case is something that should be carefully examined, not something that should be glossed over.
I also agree with what ofer said.
“even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict”
I agree. Part of the reason why it’s valuable to make the distinction is to enable more clear thinking about these sorts of issues.
I think there is only a question of how leaky, but it is always non-zero amounts of leaky, which is the reason Bostrom and others are concerned about it for all optimizers and don’t bother to make this distinction.
It seems useful to have a quick way of saying:
“The quarks in this box implement a Turing Machine that [performs well on the formal optimization problem P and does not do any other interesting stuff]. And the quarks do not do any other interesting stuff.”
(which of course does not imply that the box is safe)
Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as “optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now”, but if it becomes conflated with “and if something is an optimizer_1 I don’t have to worry about the way it is also an optimizer_2″ then that’s dangerous.
The author of the post suggests it’s a problem that “some arguments related to AI safety that seem to conflate these two concepts”. I’d say they don’t conflate them, but understand that every optimizer_1 is an optimizer_2.
Maybe we’re just not using the same definitions, but according to the definitions in the OP as I understand them, a box might indeed contain an arbitrarily strong optimizer_1 while not containing an optimizer_2.
For example, suppose the box contains an arbitrarily large computer that runs a brute-force search for some formal optimization problem. [EDIT: for some optimization problems, the evaluation of a solution might result in the execution of an optimizer_2]
Yes, and I’m saying that’s not possible. Every optimizer_1 is an optimizer_2.
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?
I would answer this the same way I did earlier in this thread, simply substituting in whatever problem you like for SAT in that example.
All feedback is feedback about the real world because the real world is the only place you can instantiate the computation to reason about “math land”.
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.
Well, one thing a powerful optimizer might do at some point is ask itself “what programm should I run that will figure out such and such for me”. This is what Bostrom is describing in the quote, an optimizer optimizing its own search process. Now, if the AI then searches through the space of possible programms, predicts which one will give it the answer quickest, and then implements it, heres a thing that might happen: There might be a programm that, when ran, affects the outside world in such a way as to speed up the process of answering.
For example, it might lead electricity to run through the computer in such a way as to cause it to emit electromagnetic waves, through which it sends a message to a nearby w-lan router and the uses the internet to hack a bank account to buy extra hardware and have it delivered to and pluged into itself, and the it runs a programm calculating the answer on this much more powerful hardware, and in this way ends up having the answer faster then if it had just started calculating away on the weaker hardware.
And if the optimizer works as described above, it will implement that programm, and thereby optimize its enviroment. Notably, it will optimize for solving the original optimisation problem faster/better, not try to implement the solution to it it has found.
I dont think this makes your distinction useless, as there are genuine system_1 optimizers, even relatively powerful ones, but the Cartesian boundary is an issue once we talk about self-improving AI.
The fact that a superintelligent AI contains an optimization algorithm does not necessarily mean that this optimization algorithm is itself superintelligent (or that it has access to the world model of the overall system, etc). It might, it might not – it depends on the design of the system.
”the Cartesian boundary is an issue once we talk about self-improving AI.”
This presumably depends on a lot of specific facts about how the system is designed.
It doesnt need to. The “inner” programm could also use its hardware as quasi-sense organs and figure out a world model of its own.
Of course this does depend on the design of the system. In the example described, you could, rather then optimize for speed itself, have a fixed function that estimates speed (like what we do in complexity theory) and then optimize for *that*, and that would get rid of the leak in question.
The point I think Bostrom is making is that contrary to intuition, just building the epistemic part of an AI and not telling it to enact the solution it found doesnt guarantee you dont get an optimizer_2.