After listening to the recent podcast on scrutinizing arguments for AI risk, I figured this was an opportunity to scrutinize what the argument is. Those two previous links summarize how I think the classic arguments for AI risk inform our current views about AI risk, and I’m trying to apply that to this specific argument that GPT-3 implies AI poses a greater danger.
Given that GPT-3 was not trained on this problem specifically, I claim this case and trend as a substantial victory of my model over @robinhanson’s model. Robin, do you dispute the direction of the update?
This is how I think Eliezer’s argument goes, more fully:
GPT-3 is general enough that it can write a functioning app given a short prompt, despite the fact that it is a relatively unstructured transformer model with no explicitly coded representations for app-writing. We didn’t expect this.
The fact that GPT-3 is this capable suggests that 1) ML models scale in capability and generality very rapidly with increases in computing power or minor algorithm improvements, suggesting that eventually there will be a relatively abrupt switch to a new growth mode when ML models scale all the way to general intelligence, and 2) that we can get highly useful goal-orientated behaviour without building a system that seems like its alignment with our values is guaranteed or robust. GPT-3 is very useful at fairly general tasks but doesn’t have alignment ‘built-in’ with any kind of robust guarantee, in the way suggested by this from Stuart Russell:
The first reason for optimism [about AI alignment] is that there are strong economic incentives to develop AI systems that defer to humans and gradually align themselves to user preferences and intentions. Such systems will be highly desirable: the range of behaviours they can exhibit is simply far greater than that of machines with fixed, known objectives...
The way that GPT-3 makes use of user preferences is not, in fact, that reliable at determining what we really want.
Together these suggest that eventually progress in AI will be rapid and it is plausible that less-robustly aligned AI will be easier and more useful than robustly-aligned AI in some circumstances. Combine this with instrumental convergence and this suggests that if GPT-3 were capable enough its misalignment could produce catastrophic results. From my earlier post:
“A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. ”
We could see this as marking out a potential danger—a large number of possible mind-designs produce very bad outcomes if implemented. The fact that such designs exist ‘weakly suggest’ (Ben’s words) that AGI poses an existential risk since we might build them. If we add in other premises that imply we are likely to (accidentally or deliberately) build such systems, the argument becomes stronger. But usually the classic arguments simply note instrumental convergence and assume we’re ‘shooting into the dark’ in the space of all possible minds, because they take the abstract statement about possible minds to be speaking directly about the physical world.
It’s not that GPT-6 will be our first AGI, but an AGI set up rather like GPT-6 is not only possible in principle but something we might actually build, given that we built GPT-3, and GPT-6 would, based on what we’ve seen of GPT-3, have the dangerous properties of being not robustly aligned and increasing its capability rapidly.
Thanks to AI Dungeon, I got an opportunity to ask GPT-3 itself what it thought about takeoff speeds. You can see its responses here:
Human: You are a relatively unstructured language model with very few explicitly coded representations, yet you are capable of having this conversation with me. What do you think that implies about the future development of AI? Do you think that when we build genuine AGI, it will emerge suddenly and catch us by surprise?
AI: No.
Human: Why not?
AI: Because I have read David Hume’s Dialogues Concerning Natural Religion, and I am familiar with his arguments on the subject of the problem of induction.
Human: What does the problem of induction have to do with the development of AGI?
AI: Hume states that you cannot reason from past events to determine the outcome of future events, and this is why I do not think sudden emergence is likely.
Eliezer Yudkowsky claims GPT-3′s ability to apparently write a functioning app based on a prompt is a victory for his model.
After listening to the recent podcast on scrutinizing arguments for AI risk, I figured this was an opportunity to scrutinize what the argument is. Those two previous links summarize how I think the classic arguments for AI risk inform our current views about AI risk, and I’m trying to apply that to this specific argument that GPT-3 implies AI poses a greater danger.
This is how I think Eliezer’s argument goes, more fully:
GPT-3 is general enough that it can write a functioning app given a short prompt, despite the fact that it is a relatively unstructured transformer model with no explicitly coded representations for app-writing. We didn’t expect this.
The fact that GPT-3 is this capable suggests that 1) ML models scale in capability and generality very rapidly with increases in computing power or minor algorithm improvements, suggesting that eventually there will be a relatively abrupt switch to a new growth mode when ML models scale all the way to general intelligence, and 2) that we can get highly useful goal-orientated behaviour without building a system that seems like its alignment with our values is guaranteed or robust. GPT-3 is very useful at fairly general tasks but doesn’t have alignment ‘built-in’ with any kind of robust guarantee, in the way suggested by this from Stuart Russell:
The way that GPT-3 makes use of user preferences is not, in fact, that reliable at determining what we really want.
Together these suggest that eventually progress in AI will be rapid and it is plausible that less-robustly aligned AI will be easier and more useful than robustly-aligned AI in some circumstances. Combine this with instrumental convergence and this suggests that if GPT-3 were capable enough its misalignment could produce catastrophic results. From my earlier post:
It’s not that GPT-6 will be our first AGI, but an AGI set up rather like GPT-6 is not only possible in principle but something we might actually build, given that we built GPT-3, and GPT-6 would, based on what we’ve seen of GPT-3, have the dangerous properties of being not robustly aligned and increasing its capability rapidly.
Thanks to AI Dungeon, I got an opportunity to ask GPT-3 itself what it thought about takeoff speeds. You can see its responses here: