Let’s say we have something you consider an optimizer_1, a SAT solver. It operates over a set of variables V arranged in predicts P using an algorithm A. Since this is a real SAT solver that is computed rather than a purely mathematical one we think about, it runs on some computer C and thus for each of V, P, and A there is some C(V), C(P), and C(A) that is the manifestation of each on the computer. We can conceptualize what C does to V, P, and A in different ways: it turns them into bytes, it turns A into instructions, it uses C(A) to operate on C(V) and C(P) to produce a solution for V and P.
Now the intention is that the algorithm A is an optimizer_1 that only operates on V and P, but in fact A is never run, properly speaking, C(A) is, and we can only say A is run to the extent C(A) does something to reality that we can set up an isomorphism to A with. So C(A) is only an optimizer_1 to the extent the isomorphism holds and it is, as you defined optimizer_1, “solving a computational optimization problem”. But properly speaking C(A) doesn’t “know” it’s an algorithm: it’s just matter arranged in a way that is isomorphic, via some transformation, to A.
So what is C(A) doing then to produce a solution? Well, I’d say it “optimizes its environment”, that is literally the matter and its configuration that it is in contact with, so it’s an optimizer_2.
You might object that there’s something special going on here such that C(A) is still an optimizer_1 because it was set up in a way that isolates it from the broader environment so it stays within the isomorphism, but that’s not a matter of classification, that’s an engineering problem of making an optimizer_2 behave as if it were an optimizer_1. And a large chunk of AI safety (mostly boxing) is dealing with ways in which, even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict.
Put pithily, there’s no free lunch when it comes to isomorphisms that allow you to manifest your algorithms to compute them, so you have to worry about the way they are computed.
I have already (sort of) addressed this point at the bottom of the post. There is a perspective from which any optimizer_1 can (kind of) be thought of as an optimizer_2, but its unclear how informative this is. It is certainly at least misleading in many cases. Whether or not the distinction is “leaky” in a given case is something that should be carefully examined, not something that should be glossed over.
I also agree with what ofer said.
“even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict”
I agree. Part of the reason why it’s valuable to make the distinction is to enable more clear thinking about these sorts of issues.
I think there is only a question of how leaky, but it is always non-zero amounts of leaky, which is the reason Bostrom and others are concerned about it for all optimizers and don’t bother to make this distinction.
“The quarks in this box implement a Turing Machine that [performs well on the formal optimization problem P and does not do any other interesting stuff]. And the quarks do not do any other interesting stuff.”
(which of course does not imply that the box is safe)
Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as “optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now”, but if it becomes conflated with “and if something is an optimizer_1 I don’t have to worry about the way it is also an optimizer_2″ then that’s dangerous.
The author of the post suggests it’s a problem that “some arguments related to AI safety that seem to conflate these two concepts”. I’d say they don’t conflate them, but understand that every optimizer_1 is an optimizer_2.
Maybe we’re just not using the same definitions, but according to the definitions in the OP as I understand them, a box might indeed contain an arbitrarily strong optimizer_1 while not containing an optimizer_2.
For example, suppose the box contains an arbitrarily large computer that runs a brute-force search for some formal optimization problem. [EDIT: for some optimization problems, the evaluation of a solution might result in the execution of an optimizer_2]
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?
I would answer this the same way I did earlier in this thread, simply substituting in whatever problem you like for SAT in that example.
All feedback is feedback about the real world because the real world is the only place you can instantiate the computation to reason about “math land”.
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.
Sure, let’s be super specific about it.
Let’s say we have something you consider an optimizer_1, a SAT solver. It operates over a set of variables V arranged in predicts P using an algorithm A. Since this is a real SAT solver that is computed rather than a purely mathematical one we think about, it runs on some computer C and thus for each of V, P, and A there is some C(V), C(P), and C(A) that is the manifestation of each on the computer. We can conceptualize what C does to V, P, and A in different ways: it turns them into bytes, it turns A into instructions, it uses C(A) to operate on C(V) and C(P) to produce a solution for V and P.
Now the intention is that the algorithm A is an optimizer_1 that only operates on V and P, but in fact A is never run, properly speaking, C(A) is, and we can only say A is run to the extent C(A) does something to reality that we can set up an isomorphism to A with. So C(A) is only an optimizer_1 to the extent the isomorphism holds and it is, as you defined optimizer_1, “solving a computational optimization problem”. But properly speaking C(A) doesn’t “know” it’s an algorithm: it’s just matter arranged in a way that is isomorphic, via some transformation, to A.
So what is C(A) doing then to produce a solution? Well, I’d say it “optimizes its environment”, that is literally the matter and its configuration that it is in contact with, so it’s an optimizer_2.
You might object that there’s something special going on here such that C(A) is still an optimizer_1 because it was set up in a way that isolates it from the broader environment so it stays within the isomorphism, but that’s not a matter of classification, that’s an engineering problem of making an optimizer_2 behave as if it were an optimizer_1. And a large chunk of AI safety (mostly boxing) is dealing with ways in which, even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict.
Put pithily, there’s no free lunch when it comes to isomorphisms that allow you to manifest your algorithms to compute them, so you have to worry about the way they are computed.
I have already (sort of) addressed this point at the bottom of the post. There is a perspective from which any optimizer_1 can (kind of) be thought of as an optimizer_2, but its unclear how informative this is. It is certainly at least misleading in many cases. Whether or not the distinction is “leaky” in a given case is something that should be carefully examined, not something that should be glossed over.
I also agree with what ofer said.
“even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it “breaks” the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn’t think it would do if the isomorphism were strict”
I agree. Part of the reason why it’s valuable to make the distinction is to enable more clear thinking about these sorts of issues.
I think there is only a question of how leaky, but it is always non-zero amounts of leaky, which is the reason Bostrom and others are concerned about it for all optimizers and don’t bother to make this distinction.
It seems useful to have a quick way of saying:
“The quarks in this box implement a Turing Machine that [performs well on the formal optimization problem P and does not do any other interesting stuff]. And the quarks do not do any other interesting stuff.”
(which of course does not imply that the box is safe)
Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as “optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now”, but if it becomes conflated with “and if something is an optimizer_1 I don’t have to worry about the way it is also an optimizer_2″ then that’s dangerous.
The author of the post suggests it’s a problem that “some arguments related to AI safety that seem to conflate these two concepts”. I’d say they don’t conflate them, but understand that every optimizer_1 is an optimizer_2.
Maybe we’re just not using the same definitions, but according to the definitions in the OP as I understand them, a box might indeed contain an arbitrarily strong optimizer_1 while not containing an optimizer_2.
For example, suppose the box contains an arbitrarily large computer that runs a brute-force search for some formal optimization problem. [EDIT: for some optimization problems, the evaluation of a solution might result in the execution of an optimizer_2]
Yes, and I’m saying that’s not possible. Every optimizer_1 is an optimizer_2.
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?
I would answer this the same way I did earlier in this thread, simply substituting in whatever problem you like for SAT in that example.
All feedback is feedback about the real world because the real world is the only place you can instantiate the computation to reason about “math land”.
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.