Belief: There is no amount of computing power which would make AlphaGo Zero(AGZ) turn the world into computronium in order to make the best possible Go moves (even if we assume there is some strategy which would let the system achieve this, like manipulating humans with cleverly chosen Go moves).
My reasoning is that AGZ is trained by recursively approximating a Monte Carlo Tree Search guided by its current model (very rough explanation which is probably missing something important). And it seems the “attractor” in this system is “perfect Go play”, not “whatever Go play leads to better Go play in the future”. There is no way for a system like this to learn that humans exist, or that it’s running on a computer of a certain type, or even to conceptualize that certain moves may alter certain parameters of the system, because these things aren’t captured in the MCTS, only the rules of Go.
This isn’t an argument against dangerous AGI in general—I’m trying to clarify my thinking about the whole “Tool AI vs Agent AI” thing, before I read reframing superintelligence.
Sounds correct to me. As long as the AI has no model of the outside world and no model of itself (and perhaps a few extra assumptions), it should keep playing within the given constraints. It may produce results that are incomprehensive to us, but it would not do so on purpose.
It’s when the “tool AI” has the model of the world—including itself, the humans, how the rewards are generated, how it could generate better results by obtaining more resources, and how humans could interfere with its goals—when the agent-ness emerges as a side effect of trying to solve the problem.
“Find the best GO move in this tree” is safe. “Find the best GO move, given the fact that the guy in the next room hates computers and will try to turn you off, which would be considered a failure at finding the best move” is dangerous. “Find the best GO move, given the fact that more computing power would likely allow you to make better moves, but humans would try to prevent you from getting too much resources” is an x-risk.
I recommend Two Sense of “Optimizer”. I agree with you, roughly. I think that it will be relatively easy to build a tool AI that has very strong capabilities, and much harder to build something with world optimization capabilities. This implies that powerful tool-AI (or AI services) will come first, and for the most part they will be safe in the way that agentic AI isn’t safe.
However, two potential things which may trouble this analysis:
1. Tool AI’s might become agentic AIs because of some emergent mesa-optimization program which encourages world optimization. It’s hard to see how likely this would be.
2. Gwern wrote about how Tool AI’s have an instrumental reason to become agentic. At one point I believed this argument, but I no longer think that it is predictive of real AI systems. Practically speaking, even if there was an instrumental advantage to becoming agentic, AGZ just isn’t optimizing a utility function (or at least, it’s not useful to describe it as such) and therefore arguments about instrumental convergence aren’t predictive of AGZ scaled up.
Belief: There is no amount of computing power which would make AlphaGo Zero(AGZ) turn the world into computronium in order to make the best possible Go moves (even if we assume there is some strategy which would let the system achieve this, like manipulating humans with cleverly chosen Go moves).
My reasoning is that AGZ is trained by recursively approximating a Monte Carlo Tree Search guided by its current model (very rough explanation which is probably missing something important). And it seems the “attractor” in this system is “perfect Go play”, not “whatever Go play leads to better Go play in the future”. There is no way for a system like this to learn that humans exist, or that it’s running on a computer of a certain type, or even to conceptualize that certain moves may alter certain parameters of the system, because these things aren’t captured in the MCTS, only the rules of Go.
This isn’t an argument against dangerous AGI in general—I’m trying to clarify my thinking about the whole “Tool AI vs Agent AI” thing, before I read reframing superintelligence.
Am I right? And is this a sound argument?
Sounds correct to me. As long as the AI has no model of the outside world and no model of itself (and perhaps a few extra assumptions), it should keep playing within the given constraints. It may produce results that are incomprehensive to us, but it would not do so on purpose.
It’s when the “tool AI” has the model of the world—including itself, the humans, how the rewards are generated, how it could generate better results by obtaining more resources, and how humans could interfere with its goals—when the agent-ness emerges as a side effect of trying to solve the problem.
“Find the best GO move in this tree” is safe. “Find the best GO move, given the fact that the guy in the next room hates computers and will try to turn you off, which would be considered a failure at finding the best move” is dangerous. “Find the best GO move, given the fact that more computing power would likely allow you to make better moves, but humans would try to prevent you from getting too much resources” is an x-risk.
I recommend Two Sense of “Optimizer”. I agree with you, roughly. I think that it will be relatively easy to build a tool AI that has very strong capabilities, and much harder to build something with world optimization capabilities. This implies that powerful tool-AI (or AI services) will come first, and for the most part they will be safe in the way that agentic AI isn’t safe.
However, two potential things which may trouble this analysis:
1. Tool AI’s might become agentic AIs because of some emergent mesa-optimization program which encourages world optimization. It’s hard to see how likely this would be.
2. Gwern wrote about how Tool AI’s have an instrumental reason to become agentic. At one point I believed this argument, but I no longer think that it is predictive of real AI systems. Practically speaking, even if there was an instrumental advantage to becoming agentic, AGZ just isn’t optimizing a utility function (or at least, it’s not useful to describe it as such) and therefore arguments about instrumental convergence aren’t predictive of AGZ scaled up.