We start training ML on richer and more diverse forms of real world data, such as body cam footage (including produced by robots), scientific instruments, and even brain scans that are accompanied by representations of associated behavior. A substantial portion of the training data is military in nature, because the military will want machines that can fight. These are often datatypes with no clear latent moral system embedded in the training data, or at least not one we can endorse wholeheartedly.
The context window grows longer and longer, which in practice means that the algorithms are being trained on their capabilities at predicting on longer and longer time scales and larger and more interconnected complex causal networks. Insofar as causal laws can be identified, these structures will come to reside in its architecture, including causal laws like ‘steering situations to be more like the ones that often lead to the target outcome tends to be a good way of achieving the target outcome.’
Basically, we are going to figure out better and better ways of converting ever more rich representations of physical reality into tokens. We’re going to do spend vast resources doing ML on those rich datasets. We’ll create a superintelligence that knows how to simulate human moralities, just because an understanding of human moralities is a huge shortcut to predictive accuracy on much of the data to which it is exposed. But it won’t be governed by those moralities. They will just be substructures within its overall architecture that may or may not get ‘switched on’ in response to some input.
During training, the model won’t ‘care’ about minimizing its loss score any more than DNA ‘cares’ about replicating, much less about acting effectively in the world as agents. Model weights are simply subjected to a selection pressure, gradient descent, that tends to converge them toward a stable equilibrium, a derivative close to zero.
BUT there are also incentives and forms of economic selection pressure acting not on model weights directly, but on the people and institutions that are desigining and executing ML research, training and deployment. These incentives and economic pressures will cause various aspects of AI technology, from a particular model or a particular hardware installation to a way of training models, to ‘survive’ (i.e. be deployed) or ‘replicate’ (i.e. inspire the design of the next model).
There will be lots of dimensions on which AI models can be selected for this sort of survival, including being cheap and performant and consistently useful (including safe, where applicable—terrorists and militaries may not think about ‘safety’ in quite the way most people do) and delightful in the specific ways that induce humans to continue using and paying for it, and being tractable to deploy from an economic, technological and regulatory perspective. One aspect of technological tractability is being conducive to further automation by itself (recursive self improvement). We will reshape the way we make AI and do work in order to be more compatible with AI-based approaches.
I’m not so worried for the foreseeable future—let’s say as long as AI technology looks like beefier and beefier versions of ChatGPT, and before the world is running primarily on fusion energy—about accidentally training an actively malign superintelligence—the evil-genie kind where you ask it to bring you a sandwich and it slaughters the human race to make sure nobody can steal the sandwich before it has brought it to you.
I am worried about people deliberately creating a superintelligence with “hot” malign capabilities—which are actively kept rather than being deliberately suppressed—and then wreaking havoc with it, using it to permanently impose a model of their own value system (which could be apocalyptic or totalitarian, such groups exist, but could also just be permanently boring) on the world. Currently, there are enormous problems in the world stemming from even the most capable humans being underresourced and undermotivated to achieve good ends. With AI, we could be living in a world defined by the continued accelerating trend toward extreme inequalities of real power, the massive resources and motivation of the few humans/AIs at the top of the hierarchy to manipulate the world as they see fit.
We have never lived in a world like that before. Many things come to pass. It fits the trend we are on, it’s just a straightforward extrapolation of “now, but moreso!”
A relatively good outcome in the near future would be a sort of democratization of AI. I don’t mean open source AT ALL. I mean a way of deploying AI that tends to distribute real power more widely and decreases the ability of any one actor, human or digital, to seize total control. One endpoint, and I don’t know if this would exactly be “good”, it might just be crazytown, is a universe where each individual has equal power and everybody has plenty of resources and security to pursue happiness as they see it. Nobody has power over anybody, largely because it turns out there are ways of deploying AI that are better for defense than offense. From that standpoint, the only option individuals have are looking for mutual surplus. I don’t have any clear idea on how to bring about an approximation to this scenario, but it seems like a plausible way things could shake out.
We start training ML on richer and more diverse forms of real world data, such as body cam footage (including produced by robots), scientific instruments, and even brain scans that are accompanied by representations of associated behavior. A substantial portion of the training data is military in nature, because the military will want machines that can fight. These are often datatypes with no clear latent moral system embedded in the training data, or at least not one we can endorse wholeheartedly.
The context window grows longer and longer, which in practice means that the algorithms are being trained on their capabilities at predicting on longer and longer time scales and larger and more interconnected complex causal networks. Insofar as causal laws can be identified, these structures will come to reside in its architecture, including causal laws like ‘steering situations to be more like the ones that often lead to the target outcome tends to be a good way of achieving the target outcome.’
Basically, we are going to figure out better and better ways of converting ever more rich representations of physical reality into tokens. We’re going to do spend vast resources doing ML on those rich datasets. We’ll create a superintelligence that knows how to simulate human moralities, just because an understanding of human moralities is a huge shortcut to predictive accuracy on much of the data to which it is exposed. But it won’t be governed by those moralities. They will just be substructures within its overall architecture that may or may not get ‘switched on’ in response to some input.
During training, the model won’t ‘care’ about minimizing its loss score any more than DNA ‘cares’ about replicating, much less about acting effectively in the world as agents. Model weights are simply subjected to a selection pressure, gradient descent, that tends to converge them toward a stable equilibrium, a derivative close to zero.
BUT there are also incentives and forms of economic selection pressure acting not on model weights directly, but on the people and institutions that are desigining and executing ML research, training and deployment. These incentives and economic pressures will cause various aspects of AI technology, from a particular model or a particular hardware installation to a way of training models, to ‘survive’ (i.e. be deployed) or ‘replicate’ (i.e. inspire the design of the next model).
There will be lots of dimensions on which AI models can be selected for this sort of survival, including being cheap and performant and consistently useful (including safe, where applicable—terrorists and militaries may not think about ‘safety’ in quite the way most people do) and delightful in the specific ways that induce humans to continue using and paying for it, and being tractable to deploy from an economic, technological and regulatory perspective. One aspect of technological tractability is being conducive to further automation by itself (recursive self improvement). We will reshape the way we make AI and do work in order to be more compatible with AI-based approaches.
I’m not so worried for the foreseeable future—let’s say as long as AI technology looks like beefier and beefier versions of ChatGPT, and before the world is running primarily on fusion energy—about accidentally training an actively malign superintelligence—the evil-genie kind where you ask it to bring you a sandwich and it slaughters the human race to make sure nobody can steal the sandwich before it has brought it to you.
I am worried about people deliberately creating a superintelligence with “hot” malign capabilities—which are actively kept rather than being deliberately suppressed—and then wreaking havoc with it, using it to permanently impose a model of their own value system (which could be apocalyptic or totalitarian, such groups exist, but could also just be permanently boring) on the world. Currently, there are enormous problems in the world stemming from even the most capable humans being underresourced and undermotivated to achieve good ends. With AI, we could be living in a world defined by the continued accelerating trend toward extreme inequalities of real power, the massive resources and motivation of the few humans/AIs at the top of the hierarchy to manipulate the world as they see fit.
We have never lived in a world like that before. Many things come to pass. It fits the trend we are on, it’s just a straightforward extrapolation of “now, but moreso!”
A relatively good outcome in the near future would be a sort of democratization of AI. I don’t mean open source AT ALL. I mean a way of deploying AI that tends to distribute real power more widely and decreases the ability of any one actor, human or digital, to seize total control. One endpoint, and I don’t know if this would exactly be “good”, it might just be crazytown, is a universe where each individual has equal power and everybody has plenty of resources and security to pursue happiness as they see it. Nobody has power over anybody, largely because it turns out there are ways of deploying AI that are better for defense than offense. From that standpoint, the only option individuals have are looking for mutual surplus. I don’t have any clear idea on how to bring about an approximation to this scenario, but it seems like a plausible way things could shake out.