This post attempts to set out my views on why I don’t find technical alignment to be a pressing issue, compared to more mundane AI risks like human misuse. It represents my personal views only, as someone who is not in AI research. I would appreciate any and all feedback.
Summary of views
A profound economic transformation through AI is a lot more likely to happen through various implementations of narrow AI, rather than artificial general intelligence (AGI).
An alternative way to frame this view is that if both are possible, then narrow AI transformation happens earlier in the timeline, with AGI in the distant long-term. In this scenario, risks like robustness, bias, and human safety problems are more likely and immediate sources of AI x-risk compared to AI alignment, and should accordingly receive more focus.
I believe development of AGI will take longer than the estimates of most people in this community. It is common to conflate cognition and execution in a lot of the AGI arguments that I’ve read, which I don’t think is an obvious conclusion. I think that developing an AI that can complete any action at least as well as a human is a hard problem that will require a breakthrough in the way we architect AIs.
Cutting-edge AI research that advances AI capabilities (not simply implement existing techniques) is the domain of wealthy organizations and governments, who choose to allocate time and resources to this problem as they find it useful to achieving their goals. If developing general AI is hard, then it would be much more time- and cost-effective to develop multiple ‘narrow’ AI with a bounded, defined set of actions that is still capable of achieving those goals.
Consequently, the default and most likely scenario for AI x-risk is one where AI has a bounded set of actions, but is still capable of leading to x-risk through intentional misuse or unintended consequences of deploying this AI.
1. There is no prototype for an AI that can execute general human tasks.
From the various arguments I’ve read which discuss the AI alignment problem, I often see an implicit assumption that a general AI that has an intelligent and accurate view of the world, enough to generate plans and predictions, would naturally be capable of executing those plans through an unbounded set of actions.
That doesn’t strike me as true. Let’s use an advanced ‘generalist’ model like DeepMind’s GATO as an example. From my understanding, it is described as generalist because it uses the same model to analyze multiple modes of inputs and generate actions. But the output can hardly be described as general. For example, it can understand from context that I want an image caption, and output a caption in the terminal, but it can’t send me the same caption through email.
Why is that so? The model outputs a set of weights corresponding to 1) the model’s best guess of the context, i.e. generate text caption, and 2) the desired output, i.e. the word choice and sequence of text. These weights are then decoded in a deterministic way to output the result in the terminal or some other interface it is connected to. By definition, all AI models today work this way by translating its results into a defined set of output functions. That is, the default is for an AI to be bound to the ‘box’ that is its potential set of explicitly programmedoutputs.
I’m not an AI researcher, and I cannot tell you whether extending current deep learning techniques can create something like ‘intelligence’ in a model. But no matter how novel the ideas generated by the model, they are necessarily decoded to a defined set of output actions. I believe a breakthrough is required in the conceptualization of AI architecture, before we see the kind of powerful AI that is the subject of so many alignment thought experiments.
2. Narrow AI that we can implement today, if sufficiently scaled, can lead to impacts as significant as the industrial or agricultural revolution.
Not only is general AI hard, creating narrow AI for a variety of purposes seems comparatively very easy and fast. Known techniques for creating and optimizing narrow AI, such as generative adversarial networks (GANs) or stacking an ensemble of machine learning models, can lead to incredible, better-than-human results (1, 2). Furthermore, these existing techniques are generalizable to a wide variety of tasks.
We can already see clear pathways for narrow AI systems to be applied to a number of commercial purposes in the mid- to near-term. I believe such application of existing deep learning techniques is the fastest way we will reach a scenario of AI-led economic transformation.
Some examples:
An advanced iteration of DALL-E may be able to take over the market for stock photos (and potentially stock videos too) (3).
Autonomous vehicles can replace cab drivers and the increasingly large number of delivery workers (4) in the gig economy. Frameworks of driving automation are already being created to ease the transition (5).
Search technologies already contribute an estimated 0.5-1.2% of GDP (6). Taking the low estimate of 0.5% and multiplying it by the 2022 worldwide GDP (7), we have a forecast impact of 506 billion USD from search alone in 2022. This is expected to be at the low end as it doesn’t account for factors such as the increase in Internet penetration rates, which in turn increase access to search engines.
Loss of control of AI is not a particularly important consideration for these existing techniques which create ‘narrow AI’. While the intermediate steps DALL-E takes to make decisions about different pixels is opaque to me, I am certain the resulting output will be an image. The key AI safety considerations in this scenario would be around questions like robustness, bias, and securing AI systems against human misuse.
3. Unexpected interactions between narrow AI is an under-discussed topic in AI safety
This point elaborates on some thoughts I want to share regarding potential unintended consequences of AI that is one likely source of AI risk. It’s not essential to the central argument that asserts low P(misalignment x-risk).
Another component of AI safety research that I believe could use more consideration is interactions between AI. Research usually focuses on singular AI entities, but in AI applications today, we see a number of negative implications arising from interactions between AI. This is a blind spot that is rarely assessed before choosing to deploy a new AI model, and we can benefit from better frameworks in assessing this area.
An example that’s often brought up is the flash crash of 2010 (8), which was caused by the interaction of different high-frequency trading algorithms that send predefined buy/sell orders when a certain pattern is detected.
Another example I would like to suggest is social media. Facebook’s News Feed, Youtube’s Video Recommendations, and Tiktok’s Content Recommendation algorithms are all separate, non-aligned AI systems which individually work to maximize time spent on their platform. Collectively, however, they normalize behaviors like mindless scrolling, and can lead to mental health issues with negative body image, anxiety, and depression (9, 10). Some studies suggest that the risks increase with use of more platforms (11). Unfortunately, it’s been difficult to pinpoint a good quantitative estimate of their collective negative externalities.
4. Contra to point 1: An AI that can write and deploy code automatically without human oversight may be sufficiently close to ‘generalist’ for the purposes of AI alignment discussions.
In point 1, I discussed my view that even a hypothetical AI with general intelligence is essentially a brain in a jar: the way that its thoughts are converted to actions is a hurdle to achieving a generally-capable AGI.
Partly it is also a rebuttal of the less-rigorous arguments I see sometimes that hand-wave away very important points on how a hypothetical AGI might achieve its goals (“A misaligned paperclip maximiser can take over all the factories and make them create only paperclips!! … How does it do that? Well, AGI is still decades away and it’s superintelligent, I’m sure it can figure it out.”)
But there is a hypothetical situation I can think of where an AI’s output is well-defined and bounded, but still capable of something resembling general actions. This is the case where an AI’s output is code that it is allowed to build and deploy automatically without human oversight.
Such an AI cannot in itself output images, but it can create software that can generate images. Or it can sign up for an email address, which allows it to sign up for a NightCafe account, which allows it to generate original images. This is an important hypothetical, because it at least gives us a concrete starting point from which we can meaningfully think about ways we can prevent the dangers of a rogue AI.
Loss of control of AI is not a likely source of AI x-risk
This post attempts to set out my views on why I don’t find technical alignment to be a pressing issue, compared to more mundane AI risks like human misuse. It represents my personal views only, as someone who is not in AI research. I would appreciate any and all feedback.
Summary of views
A profound economic transformation through AI is a lot more likely to happen through various implementations of narrow AI, rather than artificial general intelligence (AGI).
An alternative way to frame this view is that if both are possible, then narrow AI transformation happens earlier in the timeline, with AGI in the distant long-term. In this scenario, risks like robustness, bias, and human safety problems are more likely and immediate sources of AI x-risk compared to AI alignment, and should accordingly receive more focus.
I believe development of AGI will take longer than the estimates of most people in this community. It is common to conflate cognition and execution in a lot of the AGI arguments that I’ve read, which I don’t think is an obvious conclusion. I think that developing an AI that can complete any action at least as well as a human is a hard problem that will require a breakthrough in the way we architect AIs.
Cutting-edge AI research that advances AI capabilities (not simply implement existing techniques) is the domain of wealthy organizations and governments, who choose to allocate time and resources to this problem as they find it useful to achieving their goals. If developing general AI is hard, then it would be much more time- and cost-effective to develop multiple ‘narrow’ AI with a bounded, defined set of actions that is still capable of achieving those goals.
Consequently, the default and most likely scenario for AI x-risk is one where AI has a bounded set of actions, but is still capable of leading to x-risk through intentional misuse or unintended consequences of deploying this AI.
1. There is no prototype for an AI that can execute general human tasks.
From the various arguments I’ve read which discuss the AI alignment problem, I often see an implicit assumption that a general AI that has an intelligent and accurate view of the world, enough to generate plans and predictions, would naturally be capable of executing those plans through an unbounded set of actions.
That doesn’t strike me as true. Let’s use an advanced ‘generalist’ model like DeepMind’s GATO as an example. From my understanding, it is described as generalist because it uses the same model to analyze multiple modes of inputs and generate actions. But the output can hardly be described as general. For example, it can understand from context that I want an image caption, and output a caption in the terminal, but it can’t send me the same caption through email.
Why is that so? The model outputs a set of weights corresponding to 1) the model’s best guess of the context, i.e. generate text caption, and 2) the desired output, i.e. the word choice and sequence of text. These weights are then decoded in a deterministic way to output the result in the terminal or some other interface it is connected to. By definition, all AI models today work this way by translating its results into a defined set of output functions. That is, the default is for an AI to be bound to the ‘box’ that is its potential set of explicitly programmed outputs.
I’m not an AI researcher, and I cannot tell you whether extending current deep learning techniques can create something like ‘intelligence’ in a model. But no matter how novel the ideas generated by the model, they are necessarily decoded to a defined set of output actions. I believe a breakthrough is required in the conceptualization of AI architecture, before we see the kind of powerful AI that is the subject of so many alignment thought experiments.
2. Narrow AI that we can implement today, if sufficiently scaled, can lead to impacts as significant as the industrial or agricultural revolution.
Not only is general AI hard, creating narrow AI for a variety of purposes seems comparatively very easy and fast. Known techniques for creating and optimizing narrow AI, such as generative adversarial networks (GANs) or stacking an ensemble of machine learning models, can lead to incredible, better-than-human results (1, 2). Furthermore, these existing techniques are generalizable to a wide variety of tasks.
We can already see clear pathways for narrow AI systems to be applied to a number of commercial purposes in the mid- to near-term. I believe such application of existing deep learning techniques is the fastest way we will reach a scenario of AI-led economic transformation.
Some examples:
An advanced iteration of DALL-E may be able to take over the market for stock photos (and potentially stock videos too) (3).
Autonomous vehicles can replace cab drivers and the increasingly large number of delivery workers (4) in the gig economy. Frameworks of driving automation are already being created to ease the transition (5).
Search technologies already contribute an estimated 0.5-1.2% of GDP (6). Taking the low estimate of 0.5% and multiplying it by the 2022 worldwide GDP (7), we have a forecast impact of 506 billion USD from search alone in 2022. This is expected to be at the low end as it doesn’t account for factors such as the increase in Internet penetration rates, which in turn increase access to search engines.
Loss of control of AI is not a particularly important consideration for these existing techniques which create ‘narrow AI’. While the intermediate steps DALL-E takes to make decisions about different pixels is opaque to me, I am certain the resulting output will be an image. The key AI safety considerations in this scenario would be around questions like robustness, bias, and securing AI systems against human misuse.
3. Unexpected interactions between narrow AI is an under-discussed topic in AI safety
This point elaborates on some thoughts I want to share regarding potential unintended consequences of AI that is one likely source of AI risk. It’s not essential to the central argument that asserts low P(misalignment x-risk).
Another component of AI safety research that I believe could use more consideration is interactions between AI. Research usually focuses on singular AI entities, but in AI applications today, we see a number of negative implications arising from interactions between AI. This is a blind spot that is rarely assessed before choosing to deploy a new AI model, and we can benefit from better frameworks in assessing this area.
An example that’s often brought up is the flash crash of 2010 (8), which was caused by the interaction of different high-frequency trading algorithms that send predefined buy/sell orders when a certain pattern is detected.
Another example I would like to suggest is social media. Facebook’s News Feed, Youtube’s Video Recommendations, and Tiktok’s Content Recommendation algorithms are all separate, non-aligned AI systems which individually work to maximize time spent on their platform. Collectively, however, they normalize behaviors like mindless scrolling, and can lead to mental health issues with negative body image, anxiety, and depression (9, 10). Some studies suggest that the risks increase with use of more platforms (11). Unfortunately, it’s been difficult to pinpoint a good quantitative estimate of their collective negative externalities.
4. Contra to point 1: An AI that can write and deploy code automatically without human oversight may be sufficiently close to ‘generalist’ for the purposes of AI alignment discussions.
In point 1, I discussed my view that even a hypothetical AI with general intelligence is essentially a brain in a jar: the way that its thoughts are converted to actions is a hurdle to achieving a generally-capable AGI.
Partly it is also a rebuttal of the less-rigorous arguments I see sometimes that hand-wave away very important points on how a hypothetical AGI might achieve its goals (“A misaligned paperclip maximiser can take over all the factories and make them create only paperclips!! … How does it do that? Well, AGI is still decades away and it’s superintelligent, I’m sure it can figure it out.”)
But there is a hypothetical situation I can think of where an AI’s output is well-defined and bounded, but still capable of something resembling general actions. This is the case where an AI’s output is code that it is allowed to build and deploy automatically without human oversight.
Such an AI cannot in itself output images, but it can create software that can generate images. Or it can sign up for an email address, which allows it to sign up for a NightCafe account, which allows it to generate original images. This is an important hypothetical, because it at least gives us a concrete starting point from which we can meaningfully think about ways we can prevent the dangers of a rogue AI.