The only actual technical arguments I can make out are in this paragraph:
It also requires that this new more powerful system not only be far smarter in most all important areas, but also be extremely capable at managing its now-enormous internal coordination problems. And it requires that this system not be a mere tool, but a full “agent” with its own plans, goals, and actions.
It seems Hanson expects a superintelligence to be hampered by internal coordination problems like those of large firms, which would severely limit such an AI’s capabilities. I guess that viewing neural network training as solving a coordination problem between the different nodes internal to the network is one way of framing the problem. The difference with humans coordinating in firms is that gradient descent can essentially modify the brains of all the “internal agents” inside a network in order to optimize a single objective. I suspect that microsoft would be a hell of a lot more powerful if Nadella could directly link the stock price to neural changes of all employees...
The comment about tool-AI vs agent-AI is just ignorant (or incredibly dismissive) of mesa-optimizers and the fact that being asked to predict what an agent would do immediately instantiates such an agent inside the tool-AI. It’s obvious that a tool-AI is safer than an explicitely agentic one, but not for arbitrary levels of intelligence.
In addition, the roughly decade duration predicted from prior trends for the length of the next transition period seems plenty of time for today’s standard big computer system testing practices to notice alignment issues.
So, this is trying to predict the difference in time between “alignment issues obvious” to “humans cannot control AI” by pattern-matching to 3 previous economic transitions in world-history. There’s a bunch wrong here, but at the very least, if you have a sequence with 3 datapoints and are trying to predict the fourth one, your errors bars ought to be massive (unless you have a model for the data-generating distribution with few degrees of freedom). We can probably be confident that the next transition will take less time than the previous ones, but 3 datapoints is just not enough information to meaningfully constrain the transition time in any real way.
The comment about tool-AI vs agent-AI is just ignorant (or incredibly dismissive) of mesa-optimizers and the fact that being asked to predict what an agent would do immediately instantiates such an agent inside the tool-AI. It’s obvious that a tool-AI is safer than an explicitely agentic one, but not for arbitrary levels of intelligence.
This seems way too confident to me given the level of generality of your statement. And to be clear, my view is that this could easily happen in LLMs based on transformers, but what other architectures? If you just talk about how a generic “tool-AI” would or would not behave, it seems to me that you are operating on a level of abstraction far too high to be able to make such specific statements with confidence.
The only actual technical arguments I can make out are in this paragraph:
It seems Hanson expects a superintelligence to be hampered by internal coordination problems like those of large firms, which would severely limit such an AI’s capabilities. I guess that viewing neural network training as solving a coordination problem between the different nodes internal to the network is one way of framing the problem. The difference with humans coordinating in firms is that gradient descent can essentially modify the brains of all the “internal agents” inside a network in order to optimize a single objective. I suspect that microsoft would be a hell of a lot more powerful if Nadella could directly link the stock price to neural changes of all employees...
The comment about tool-AI vs agent-AI is just ignorant (or incredibly dismissive) of mesa-optimizers and the fact that being asked to predict what an agent would do immediately instantiates such an agent inside the tool-AI. It’s obvious that a tool-AI is safer than an explicitely agentic one, but not for arbitrary levels of intelligence.
So, this is trying to predict the difference in time between “alignment issues obvious” to “humans cannot control AI” by pattern-matching to 3 previous economic transitions in world-history. There’s a bunch wrong here, but at the very least, if you have a sequence with 3 datapoints and are trying to predict the fourth one, your errors bars ought to be massive (unless you have a model for the data-generating distribution with few degrees of freedom). We can probably be confident that the next transition will take less time than the previous ones, but 3 datapoints is just not enough information to meaningfully constrain the transition time in any real way.
This seems way too confident to me given the level of generality of your statement. And to be clear, my view is that this could easily happen in LLMs based on transformers, but what other architectures? If you just talk about how a generic “tool-AI” would or would not behave, it seems to me that you are operating on a level of abstraction far too high to be able to make such specific statements with confidence.