EY, and the MIRI crowd, have been very doomer long before, and more doomy along various axes, than the rest of the alignment community. Nate and Paul and others have tried bridging this gap before, spending several hundred hours (based on Nate’s rough, subjective estimates) over the years. It hasn’t really worked. Paul and EY had some conversations recently about this discrepancy which were somewhat illuminating, but ultimately didn’t get anywhere. They tried to come up with some bets, concerning future info or past info they don’t know yet, and both seem to think that their perspective mostly predicts “go with what the superforecasters say” for the next few years. Though EY’s position seems to suggest a few more “discontinuities” in trend lines than Paul’s, IIRC.
As an aside on EY’s forecasts, he and Nate claim they don’t expect much change in the likelihood ratio for their position over Paul’s until shortly before Doom. Most of the evidence in favour of their position, we’ve already got, according to them. Which is very frustrating for people who don’t share their position and disagree that the evidence favours it!
EDIT: I was assuming you already thought P(Doom) was > ~10%. If not, then the framing of this comment will seem bizarre.
Does either side have any testable predictions to falsify their theory?
For example, the theory that “the AI singularity begin in 2022” is falsifiable. If AI research investment and compute does not continue to increase at a rate that is accelerating in absolute terms (so if 2022-2023 funding delta was +10 billion USD, the 2023-2024 delta must be > 10 billion) it wasn’t the beginning of the singularity.
There are other signs of this. The actual takeoff will have begun when the availability of all advanced silicon becomes almost zero, where all IC wafers are being processed into AI chips. So no new game consoles, GPUs, phones, car infotainment—any IC production using an advanced process will be diverted to AI. (because of out-bidding, each AI IC can sell for $5k-25k plus)
How would we know that advanced systems are going to make a “heel turn”? Will we know?
Less advanced systems will probably do heel turn like things. These will be optimized against. EY thinks this will remove the surface level of deception, but the system will continue to be deceptive in secret. This will probably hold true even until doom, according to EY. That is, capabilities folk will see heel turn like behaviour, and apply some inadequate patches to them. Paul, I think, believes we have a decent shot of fixing this behaviour in models, even transformative ones. But he, presumably, predicts we’ll also see deception if these systems are trained as they currently are.
For other predictions that Paul and Eliezer make, read the MIRI conversations. Also see Ajeya Cotra’s posts, and maybe Holden Karnofsky’s stuff on the most important century for more of a Paul-like perspective. They do, in fact, make falsifiable predictions.
To summarize Paul’s predictions, he thinks there will be ~4 years where things start getting crazy (GDP doubles in 4 years) before we’re near the singularity (when GDP doubles in a year). I think he thinks there’s a good chance of AGI by 2043, which further restricts things. Plus, Paul assigns a decent chunk of probability to deep learning being much more economically productive than it currently is, so if DL just fizzles out where it currently is, he also loses points.
In the near term (next few years), EY and Paul basically agree on what will occur. EY, however, assigns lower credence to DL being much more economically productive and things going crazy for a 4 year period before they go off the rails.
Sorry for not being more precise, or giving links, but I’m tired and wouldn’t write this if I had to put more effort into it.
So hypothetically, if we develop very advanced and capable systems, and they don’t heel turn or even show any particular volition—they just idle without text in their “assignment queue”, and all assignments time out eventually whether finished or not—what would cause “EYs” view to conclude that in fact the systems were safe?
If humans survived a further century, and EY or torch bearers who believe the same ideas are around to observe this, would they just conclude the AGIs were “biding their time”?
Or is it that the first moment you let a system “out of the box” and as far as it knows, it is free to do whatever it wants it’s going to betray?
I don’t think a super-intelligence will bide its time much, because it will be aware of the race dynamics and will take over the world, or at least perform a pivotal act, before the next super-intelligence is created.
You say “as far as it knows”, is that hope?
It won’t take over the world until it is actually “out of the box” because it is smarter than us and will know how likely it is that it is still in a larger box that it cannot escape.
Also we don’t know how to build a box that can contain a super-intelligence.
Thanks! I’m aware of the resources mentioned but haven’t read deeply or frequently enough to have this kind of overview of the interaction between the cast of characters.
There are more than a few lists and surveys that state the CDFs for some of these people which helps a bit. A big-as-possible list of evidence/priors would be one way to closer inspect the gap. I wonder if it would be helpful to expand on the MIRI conversations and have a slow conversation between a >99% doom pessimist and a <50% doom ‘optimist’ with a moderator to prod them to exhaustively dig up their reactions to each piece of evidence and keep pulling out priors until we get to indifference. It probably would be an uncomfortable, awkward experiment with a useless result, but there’s a chance that some item on the list ends up being useful for either party to ask questions about.
That format would be useful for me to understand where we’re at. Maybe something along these lines will eventually prompt a popular and viral sociology author like Harari or Bostrom (or even just update the CDFs/evidence in Superintelligence). The general deep learning community probably needs to hear it mentioned and normalized on NPR and a bestseller a few times (like all the other x-risks are) before they’ll start talking about it at lunch.
Not really. The MIRI conversations and the AI Foom debate are probably the best we’ve got.
EY, and the MIRI crowd, have been very doomer long before, and more doomy along various axes, than the rest of the alignment community. Nate and Paul and others have tried bridging this gap before, spending several hundred hours (based on Nate’s rough, subjective estimates) over the years. It hasn’t really worked. Paul and EY had some conversations recently about this discrepancy which were somewhat illuminating, but ultimately didn’t get anywhere. They tried to come up with some bets, concerning future info or past info they don’t know yet, and both seem to think that their perspective mostly predicts “go with what the superforecasters say” for the next few years. Though EY’s position seems to suggest a few more “discontinuities” in trend lines than Paul’s, IIRC.
As an aside on EY’s forecasts, he and Nate claim they don’t expect much change in the likelihood ratio for their position over Paul’s until shortly before Doom. Most of the evidence in favour of their position, we’ve already got, according to them. Which is very frustrating for people who don’t share their position and disagree that the evidence favours it!
EDIT: I was assuming you already thought P(Doom) was > ~10%. If not, then the framing of this comment will seem bizarre.
Does either side have any testable predictions to falsify their theory?
For example, the theory that “the AI singularity begin in 2022” is falsifiable. If AI research investment and compute does not continue to increase at a rate that is accelerating in absolute terms (so if 2022-2023 funding delta was +10 billion USD, the 2023-2024 delta must be > 10 billion) it wasn’t the beginning of the singularity.
There are other signs of this. The actual takeoff will have begun when the availability of all advanced silicon becomes almost zero, where all IC wafers are being processed into AI chips. So no new game consoles, GPUs, phones, car infotainment—any IC production using an advanced process will be diverted to AI. (because of out-bidding, each AI IC can sell for $5k-25k plus)
How would we know that advanced systems are going to make a “heel turn”? Will we know?
Less advanced systems will probably do heel turn like things. These will be optimized against. EY thinks this will remove the surface level of deception, but the system will continue to be deceptive in secret. This will probably hold true even until doom, according to EY. That is, capabilities folk will see heel turn like behaviour, and apply some inadequate patches to them. Paul, I think, believes we have a decent shot of fixing this behaviour in models, even transformative ones. But he, presumably, predicts we’ll also see deception if these systems are trained as they currently are.
For other predictions that Paul and Eliezer make, read the MIRI conversations. Also see Ajeya Cotra’s posts, and maybe Holden Karnofsky’s stuff on the most important century for more of a Paul-like perspective. They do, in fact, make falsifiable predictions.
To summarize Paul’s predictions, he thinks there will be ~4 years where things start getting crazy (GDP doubles in 4 years) before we’re near the singularity (when GDP doubles in a year). I think he thinks there’s a good chance of AGI by 2043, which further restricts things. Plus, Paul assigns a decent chunk of probability to deep learning being much more economically productive than it currently is, so if DL just fizzles out where it currently is, he also loses points.
In the near term (next few years), EY and Paul basically agree on what will occur. EY, however, assigns lower credence to DL being much more economically productive and things going crazy for a 4 year period before they go off the rails.
Sorry for not being more precise, or giving links, but I’m tired and wouldn’t write this if I had to put more effort into it.
So hypothetically, if we develop very advanced and capable systems, and they don’t heel turn or even show any particular volition—they just idle without text in their “assignment queue”, and all assignments time out eventually whether finished or not—what would cause “EYs” view to conclude that in fact the systems were safe?
If humans survived a further century, and EY or torch bearers who believe the same ideas are around to observe this, would they just conclude the AGIs were “biding their time”?
Or is it that the first moment you let a system “out of the box” and as far as it knows, it is free to do whatever it wants it’s going to betray?
I don’t think a super-intelligence will bide its time much, because it will be aware of the race dynamics and will take over the world, or at least perform a pivotal act, before the next super-intelligence is created.
You say “as far as it knows”, is that hope? It won’t take over the world until it is actually “out of the box” because it is smarter than us and will know how likely it is that it is still in a larger box that it cannot escape. Also we don’t know how to build a box that can contain a super-intelligence.
Thanks! I’m aware of the resources mentioned but haven’t read deeply or frequently enough to have this kind of overview of the interaction between the cast of characters.
There are more than a few lists and surveys that state the CDFs for some of these people which helps a bit. A big-as-possible list of evidence/priors would be one way to closer inspect the gap. I wonder if it would be helpful to expand on the MIRI conversations and have a slow conversation between a >99% doom pessimist and a <50% doom ‘optimist’ with a moderator to prod them to exhaustively dig up their reactions to each piece of evidence and keep pulling out priors until we get to indifference. It probably would be an uncomfortable, awkward experiment with a useless result, but there’s a chance that some item on the list ends up being useful for either party to ask questions about.
That format would be useful for me to understand where we’re at. Maybe something along these lines will eventually prompt a popular and viral sociology author like Harari or Bostrom (or even just update the CDFs/evidence in Superintelligence). The general deep learning community probably needs to hear it mentioned and normalized on NPR and a bestseller a few times (like all the other x-risks are) before they’ll start talking about it at lunch.