If this is what counts as a warning shot, then we’ve already encountered several warning shots already, right?
Kind of. None of the examples you mention have had significant real-world impacts (whereas in the example I give, people very directly lose millions of dollars). Possibly the Google gorilla example counts, because of the negative PR.
I do think that the boat race example has in fact been very influential and has the effects I expect of a “warning shot”, but I usually try to reserve the term for cases with significant economic impact.
As a tangent, I notice you say “several months later.” I worry that this is too long a time lag. I think slow takeoff is possible but so is fast takeoff
I’m on record in a few places as saying that a major crux for me is slow takeoff. I struggle to imagine a coherent world that matches what I think people mean by “fast takeoff”; I think most likely I don’t understand what proponents mean by the phrase. When I ignore this fact and try to predict anyway using my best understanding of what they mean, I get quite pessimistic; iirc in my podcast with Buck I said something like 80% chance of doom.
and even on slow takeoff several months is a loooong time.
The system I’m describing is pretty weak and far from AGI; the world has probably not started accelerating yet (US GDP growth annually is maybe 4% at this point). Several months is still a short amount of time at this point in the trajectory.
I chose an earlier example because it’s a lot easier to predict how we’ll respond; as we get later in the trajectory I expect significant changes to how we do research and deployment that I can’t predict ahead of time, and so the story has to get fuzzier.
Thanks, this is great. So the idea is that whereas corporations, politicians, AI capabilities researchers, etc. might not listen to safety concerns when all we have is a theoretical argument, or even a real-world demonstration, once we have real-world demonstrations that are causing million-dollar damages then they’ll listen. And takeoff is highly likely to be slow enough that we’ll get those sorts of real-world damages before it’s too late. I think this is a coherent possibility, I need to think more about how likely I think it is.
(Some threads to pull on: Are we sure all of the safety problems will be caught this way? e.g. what about influence-seeking systems? In a multipolar, competitive race environment are we sure that million-dollar losses somewhere will be enough to deter people from forging ahead with systems that are likely to make greater profits in expectation? What about the safety solutions proposed—might they just do cheap hacky fixes instead of a more principled solution? Might they buy in to some false theory of what’s going on, e.g. the AI made a mistake because it wasn’t smart enough and so we just need to make them smarter. Also, maybe it’s too late by this point anyway because collective epistemology has degraded significantly, or for some other reason.)
Fast takeoff still seems plausible to me. I could spend time articulating what it means to me and why I think it is plausible, but it’s not my current priority. (I’m working on acausal trade, timelines, and some miscellaneous projects). I’d be interested to know if you think it’s higher priority than the other things I’m working on.
And takeoff is highly likely to be slow enough that we’ll get those sorts of real-world damages before it’s too late.
I do also think that we could get warning shots in the more sped-up parts of the trajectory, and this could be helpful because we’ll have adapted to the fact that we’ve sped up. It’s just harder to tell a concrete story about what this looks like, because the world (or at least AI companies) will have changed so much.
I’d be interested to know if you think it’s higher priority than the other things I’m working on.
If fast takeoff is plausible at all in the sense that I think people mean it, then it seems like by far the most important crux in prioritization within AI safety.
However, I don’t expect to change my mind given arguments for fast takeoff—I suspect my response will be “oh, you mean this other thing, which is totally compatible with my views”, or “nope that just doesn’t seem plausible given how (I believe) the world works”.
MIRI’s arguments for fast takeoff seem particularly important, given that a substantial fraction of all resources going into AI safety seem to depend on those arguments. (Although possibly MIRI believes that their approach is the best thing to do even in case of slow takeoff.)
I think overall that aggregates to “seems important, but not obviously the highest priority for you to write”.
Thanks. Here’s something that is at least one crux for me re whether to bump up priority of takeoff speeds work:
Scenario: OpenSoft has produced a bigger, better AI system. It’s the size of a human brain and it makes GPT-3 seem like GPT-1. It’s awesome, and clearly has massive economic applications. However, it doesn’t seem like a human-level AGI yet, or at least not like a smart human. It makes various silly mistakes and has various weaknesses, just like GPT-3. MIRI expresses concern that this thing might already be deceptively aligned, and might already be capable of taking over the world if deployed, and might already be capable of convincing people to let it self-modify etc. But people at OpenSoft say: Discontinuities are unlikely; we haven’t seen massive economic profits from AI yet, nor have we seen warning shots, so it’s very unlikely that MIRI is correct about this. This one seems like it will be massively profitable, but if it has alignment problems they’ll be of the benign warning shot variety rather than the irreversible doom variety. So let’s deploy it!
Does this sort of scenario seem plausible to you—a scenario in which a decision about whether to deploy is made partly on the basis of belief in slow takeoff?
If so, then yeah, this makes me all the more concerned about the widespread belief in slow takeoff, and maybe I’ll reprioritize accordingly...
Yes, that scenario sounds quite likely to me, though I’d say the decision is made on the basis of belief in scaling laws / trend extrapolation rather than “slow takeoff”.
I personally would probably make arguments similar to the ones you list for OpenSoft, and I do think MIRI would be wrong if they argued it was likely that the model was deceptive.
There’s some discussion to be had about how risk-averse we should be given the extremely negative payoff of x-risk, and what that implies about deployment, which seems like the main thing I would be thinking about in this scenario.
Welp, I’m glad we had this conversation! Thanks again, this means a lot to me!
(I’d be interested to hear more about what you meant by your reference to scaling laws. You seem to think that the AI being deceptive, capable of taking over the world, etc. would violate some scaling law, but I’m not aware of any law yet discovered that talks about capabilities like that.)
I don’t mean a formal scaling law, just an intuitive “if we look at how much difference a 10x increase has made in the past to general cognitive ability, it seems extremely unlikely that this 10x increase will lead to an agent that is capable of taking over the world”.
I don’t expect that I would make this sort of argument against deception, just against existential catastrophe.
Oh OK. I didn’t mean for this to be merely a 10x increase; I said it was the size of a human brain which I believe makes it a 1000x increase in parameter count and (if we follow the scaling laws) something like a 500x increase in training data or something? idk.
If you had been imagining that the AI I was talking about used only 10x more compute than GPT-3, then I’d be more inclined to take your side rather than MIRI’s in this hypothetical debate.
I meant that it would be a ~10x increase from what at the time was the previously largest system, not a 10x increase from GPT-3. I’m talking about the arguments I’d use given the evidence we’d have at that time, not the evidence we have now.
If you’re arguing that a tech company would do this now before making systems in between GPT-3 and a human brain, I can’t see how the path you outline is even remotely feasible—you’re positing a 500,000x increase in compute costs, which I think brings compute cost of the final training run alone to high hundreds of billions or low trillions of dollars, which is laughably far beyond OpenAI and DeepMind’s budgets, and seems out of reach even for Google or other big tech companies.
Ah. Well, it sounds like you were thinking that in the scenario I outlined, the previous largest system, 10x smaller, wasn’t making much money? I didn’t mean to indicate that; feel free to suppose that this predecessor system also clearly has massive economic implications, significantly less massive than the new one though…
I wasn’t arguing that we’d do 500,000x in one go. (Though it’s entirely possible that we’d do 100x in one go—we almost did, with GPT-3)
Am I right in thinking that your general policy is something like “Progress will be continuous; therefore we’ll get warning shots; therefore if MIRI argues that a certain alignment problem may be present in a particular AI system, but thus far there hasn’t been a warning shot for that problem, then MIRI is wrong.”
Well, it sounds like you were thinking that in the scenario I outlined, the previous largest system, 10x smaller, wasn’t making much money?
No, I wasn’t assuming that? I’m not sure why you think I was.
Tbc, given that you aren’t arguing that we’d do 500,000x in one go, the second paragraph of my previous comment is moot.
Progress will be continuous; therefore we’ll get warning shots; therefore if MIRI argues that a certain alignment problem may be present in a particular AI system, but thus far there hasn’t been a warning shot for that problem, then MIRI is wrong.
Yes, as a prior. Obviously you’d want to look at the actual arguments they give and take that into account as well.
In terms of inferences about deceptive alignment, it might be useful to go back to the one and only current example we have where someone with somewhat relevant knowledge was led to wonder whether deception had taken place—GPT-3 balancing brackets. I don’t know if anyone ever got Eliezer’s $1000 bounty, but the top-level comment on that thread at least convinces me that it’s unlikely that GPT-3 via AI Dungeon was being deceptive even though Eliezer thought there was a real possibility that it was.
Now, this doesn’t prove all that much, but one thing it does suggest is that on current MIRI-like views about how likely deception is, the threshold for uncertainty about deception is set far too low. That suggests your people at OpenSoft might well be right in their assumption.
Kind of. None of the examples you mention have had significant real-world impacts (whereas in the example I give, people very directly lose millions of dollars). Possibly the Google gorilla example counts, because of the negative PR.
I do think that the boat race example has in fact been very influential and has the effects I expect of a “warning shot”, but I usually try to reserve the term for cases with significant economic impact.
I’m on record in a few places as saying that a major crux for me is slow takeoff. I struggle to imagine a coherent world that matches what I think people mean by “fast takeoff”; I think most likely I don’t understand what proponents mean by the phrase. When I ignore this fact and try to predict anyway using my best understanding of what they mean, I get quite pessimistic; iirc in my podcast with Buck I said something like 80% chance of doom.
The system I’m describing is pretty weak and far from AGI; the world has probably not started accelerating yet (US GDP growth annually is maybe 4% at this point). Several months is still a short amount of time at this point in the trajectory.
I chose an earlier example because it’s a lot easier to predict how we’ll respond; as we get later in the trajectory I expect significant changes to how we do research and deployment that I can’t predict ahead of time, and so the story has to get fuzzier.
Thanks, this is great. So the idea is that whereas corporations, politicians, AI capabilities researchers, etc. might not listen to safety concerns when all we have is a theoretical argument, or even a real-world demonstration, once we have real-world demonstrations that are causing million-dollar damages then they’ll listen. And takeoff is highly likely to be slow enough that we’ll get those sorts of real-world damages before it’s too late. I think this is a coherent possibility, I need to think more about how likely I think it is.
(Some threads to pull on: Are we sure all of the safety problems will be caught this way? e.g. what about influence-seeking systems? In a multipolar, competitive race environment are we sure that million-dollar losses somewhere will be enough to deter people from forging ahead with systems that are likely to make greater profits in expectation? What about the safety solutions proposed—might they just do cheap hacky fixes instead of a more principled solution? Might they buy in to some false theory of what’s going on, e.g. the AI made a mistake because it wasn’t smart enough and so we just need to make them smarter. Also, maybe it’s too late by this point anyway because collective epistemology has degraded significantly, or for some other reason.)
Fast takeoff still seems plausible to me. I could spend time articulating what it means to me and why I think it is plausible, but it’s not my current priority. (I’m working on acausal trade, timelines, and some miscellaneous projects). I’d be interested to know if you think it’s higher priority than the other things I’m working on.
I do also think that we could get warning shots in the more sped-up parts of the trajectory, and this could be helpful because we’ll have adapted to the fact that we’ve sped up. It’s just harder to tell a concrete story about what this looks like, because the world (or at least AI companies) will have changed so much.
If fast takeoff is plausible at all in the sense that I think people mean it, then it seems like by far the most important crux in prioritization within AI safety.
However, I don’t expect to change my mind given arguments for fast takeoff—I suspect my response will be “oh, you mean this other thing, which is totally compatible with my views”, or “nope that just doesn’t seem plausible given how (I believe) the world works”.
MIRI’s arguments for fast takeoff seem particularly important, given that a substantial fraction of all resources going into AI safety seem to depend on those arguments. (Although possibly MIRI believes that their approach is the best thing to do even in case of slow takeoff.)
I think overall that aggregates to “seems important, but not obviously the highest priority for you to write”.
Thanks. Here’s something that is at least one crux for me re whether to bump up priority of takeoff speeds work:
Scenario: OpenSoft has produced a bigger, better AI system. It’s the size of a human brain and it makes GPT-3 seem like GPT-1. It’s awesome, and clearly has massive economic applications. However, it doesn’t seem like a human-level AGI yet, or at least not like a smart human. It makes various silly mistakes and has various weaknesses, just like GPT-3. MIRI expresses concern that this thing might already be deceptively aligned, and might already be capable of taking over the world if deployed, and might already be capable of convincing people to let it self-modify etc. But people at OpenSoft say: Discontinuities are unlikely; we haven’t seen massive economic profits from AI yet, nor have we seen warning shots, so it’s very unlikely that MIRI is correct about this. This one seems like it will be massively profitable, but if it has alignment problems they’ll be of the benign warning shot variety rather than the irreversible doom variety. So let’s deploy it!
Does this sort of scenario seem plausible to you—a scenario in which a decision about whether to deploy is made partly on the basis of belief in slow takeoff?
If so, then yeah, this makes me all the more concerned about the widespread belief in slow takeoff, and maybe I’ll reprioritize accordingly...
Yes, that scenario sounds quite likely to me, though I’d say the decision is made on the basis of belief in scaling laws / trend extrapolation rather than “slow takeoff”.
I personally would probably make arguments similar to the ones you list for OpenSoft, and I do think MIRI would be wrong if they argued it was likely that the model was deceptive.
There’s some discussion to be had about how risk-averse we should be given the extremely negative payoff of x-risk, and what that implies about deployment, which seems like the main thing I would be thinking about in this scenario.
Welp, I’m glad we had this conversation! Thanks again, this means a lot to me!
(I’d be interested to hear more about what you meant by your reference to scaling laws. You seem to think that the AI being deceptive, capable of taking over the world, etc. would violate some scaling law, but I’m not aware of any law yet discovered that talks about capabilities like that.)
I don’t mean a formal scaling law, just an intuitive “if we look at how much difference a 10x increase has made in the past to general cognitive ability, it seems extremely unlikely that this 10x increase will lead to an agent that is capable of taking over the world”.
I don’t expect that I would make this sort of argument against deception, just against existential catastrophe.
Oh OK. I didn’t mean for this to be merely a 10x increase; I said it was the size of a human brain which I believe makes it a 1000x increase in parameter count and (if we follow the scaling laws) something like a 500x increase in training data or something? idk.
If you had been imagining that the AI I was talking about used only 10x more compute than GPT-3, then I’d be more inclined to take your side rather than MIRI’s in this hypothetical debate.
I meant that it would be a ~10x increase from what at the time was the previously largest system, not a 10x increase from GPT-3. I’m talking about the arguments I’d use given the evidence we’d have at that time, not the evidence we have now.
If you’re arguing that a tech company would do this now before making systems in between GPT-3 and a human brain, I can’t see how the path you outline is even remotely feasible—you’re positing a 500,000x increase in compute costs, which I think brings compute cost of the final training run alone to high hundreds of billions or low trillions of dollars, which is laughably far beyond OpenAI and DeepMind’s budgets, and seems out of reach even for Google or other big tech companies.
Ah. Well, it sounds like you were thinking that in the scenario I outlined, the previous largest system, 10x smaller, wasn’t making much money? I didn’t mean to indicate that; feel free to suppose that this predecessor system also clearly has massive economic implications, significantly less massive than the new one though…
I wasn’t arguing that we’d do 500,000x in one go. (Though it’s entirely possible that we’d do 100x in one go—we almost did, with GPT-3)
Am I right in thinking that your general policy is something like “Progress will be continuous; therefore we’ll get warning shots; therefore if MIRI argues that a certain alignment problem may be present in a particular AI system, but thus far there hasn’t been a warning shot for that problem, then MIRI is wrong.”
No, I wasn’t assuming that? I’m not sure why you think I was.
Tbc, given that you aren’t arguing that we’d do 500,000x in one go, the second paragraph of my previous comment is moot.
Yes, as a prior. Obviously you’d want to look at the actual arguments they give and take that into account as well.
OK. I can explain why I thought you thought that if you like, but I suspect it’s not important to either of us.
I think I have enough understanding of your view now that I can collect my thoughts and decide what I disagree with and why.
In terms of inferences about deceptive alignment, it might be useful to go back to the one and only current example we have where someone with somewhat relevant knowledge was led to wonder whether deception had taken place—GPT-3 balancing brackets. I don’t know if anyone ever got Eliezer’s $1000 bounty, but the top-level comment on that thread at least convinces me that it’s unlikely that GPT-3 via AI Dungeon was being deceptive even though Eliezer thought there was a real possibility that it was.
Now, this doesn’t prove all that much, but one thing it does suggest is that on current MIRI-like views about how likely deception is, the threshold for uncertainty about deception is set far too low. That suggests your people at OpenSoft might well be right in their assumption.