“The quarks in this box implement a Turing Machine that [performs well on the formal optimization problem P and does not do any other interesting stuff]. And the quarks do not do any other interesting stuff.”
(which of course does not imply that the box is safe)
Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as “optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now”, but if it becomes conflated with “and if something is an optimizer_1 I don’t have to worry about the way it is also an optimizer_2″ then that’s dangerous.
The author of the post suggests it’s a problem that “some arguments related to AI safety that seem to conflate these two concepts”. I’d say they don’t conflate them, but understand that every optimizer_1 is an optimizer_2.
Maybe we’re just not using the same definitions, but according to the definitions in the OP as I understand them, a box might indeed contain an arbitrarily strong optimizer_1 while not containing an optimizer_2.
For example, suppose the box contains an arbitrarily large computer that runs a brute-force search for some formal optimization problem. [EDIT: for some optimization problems, the evaluation of a solution might result in the execution of an optimizer_2]
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?
I would answer this the same way I did earlier in this thread, simply substituting in whatever problem you like for SAT in that example.
All feedback is feedback about the real world because the real world is the only place you can instantiate the computation to reason about “math land”.
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.
It seems useful to have a quick way of saying:
“The quarks in this box implement a Turing Machine that [performs well on the formal optimization problem P and does not do any other interesting stuff]. And the quarks do not do any other interesting stuff.”
(which of course does not imply that the box is safe)
Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as “optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now”, but if it becomes conflated with “and if something is an optimizer_1 I don’t have to worry about the way it is also an optimizer_2″ then that’s dangerous.
The author of the post suggests it’s a problem that “some arguments related to AI safety that seem to conflate these two concepts”. I’d say they don’t conflate them, but understand that every optimizer_1 is an optimizer_2.
Maybe we’re just not using the same definitions, but according to the definitions in the OP as I understand them, a box might indeed contain an arbitrarily strong optimizer_1 while not containing an optimizer_2.
For example, suppose the box contains an arbitrarily large computer that runs a brute-force search for some formal optimization problem. [EDIT: for some optimization problems, the evaluation of a solution might result in the execution of an optimizer_2]
Yes, and I’m saying that’s not possible. Every optimizer_1 is an optimizer_2.
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?
I would answer this the same way I did earlier in this thread, simply substituting in whatever problem you like for SAT in that example.
All feedback is feedback about the real world because the real world is the only place you can instantiate the computation to reason about “math land”.
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.