Less advanced systems will probably do heel turn like things. These will be optimized against. EY thinks this will remove the surface level of deception, but the system will continue to be deceptive in secret. This will probably hold true even until doom, according to EY. That is, capabilities folk will see heel turn like behaviour, and apply some inadequate patches to them. Paul, I think, believes we have a decent shot of fixing this behaviour in models, even transformative ones. But he, presumably, predicts we’ll also see deception if these systems are trained as they currently are.
For other predictions that Paul and Eliezer make, read the MIRI conversations. Also see Ajeya Cotra’s posts, and maybe Holden Karnofsky’s stuff on the most important century for more of a Paul-like perspective. They do, in fact, make falsifiable predictions.
To summarize Paul’s predictions, he thinks there will be ~4 years where things start getting crazy (GDP doubles in 4 years) before we’re near the singularity (when GDP doubles in a year). I think he thinks there’s a good chance of AGI by 2043, which further restricts things. Plus, Paul assigns a decent chunki of probability to deep learning being much more economically productive than it currently is, so if DL just fizzles out where it currently is, he also loses points.
In the near term (next few years), EY and Paul basically agree on what will occur. EY, however, assigns lower credence to DL being much more economically productive and things going crazy for a 4 year period before they go off the rails.
Sorry for not being more precise, or giving links, but I’m tired and wouldn’t write this if I had to put more effort into it.
So hypothetically, if we develop very advanced and capable systems, and they don’t heel turn or even show any particular volition—they just idle without text in their “assignment queue”, and all assignments time out eventually whether finished or not—what would cause “EYs” view to conclude that in fact the systems were safe?
If humans survived a further century, and EY or torch bearers who believe the same ideas are around to observe this, would they just conclude the AGIs were “biding their time”?
Or is it that the first moment you let a system “out of the box” and as far as it knows, it is free to do whatever it wants it’s going to betray?
I don’t think a super-intelligence will bide its time much, because it will be aware of the race dynamics and will take over the world, or at least perform a pivotal act, before the next super-intelligence is created.
You say “as far as it knows”, is that hope?
It won’t take over the world until it is actually “out of the box” because it is smarter than us and will know how likely it is that it is still in a larger box that it cannot escape.
Also we don’t know how to build a box that can contain a super-intelligence.
Less advanced systems will probably do heel turn like things. These will be optimized against. EY thinks this will remove the surface level of deception, but the system will continue to be deceptive in secret. This will probably hold true even until doom, according to EY. That is, capabilities folk will see heel turn like behaviour, and apply some inadequate patches to them. Paul, I think, believes we have a decent shot of fixing this behaviour in models, even transformative ones. But he, presumably, predicts we’ll also see deception if these systems are trained as they currently are.
For other predictions that Paul and Eliezer make, read the MIRI conversations. Also see Ajeya Cotra’s posts, and maybe Holden Karnofsky’s stuff on the most important century for more of a Paul-like perspective. They do, in fact, make falsifiable predictions.
To summarize Paul’s predictions, he thinks there will be ~4 years where things start getting crazy (GDP doubles in 4 years) before we’re near the singularity (when GDP doubles in a year). I think he thinks there’s a good chance of AGI by 2043, which further restricts things. Plus, Paul assigns a decent chunki of probability to deep learning being much more economically productive than it currently is, so if DL just fizzles out where it currently is, he also loses points.
In the near term (next few years), EY and Paul basically agree on what will occur. EY, however, assigns lower credence to DL being much more economically productive and things going crazy for a 4 year period before they go off the rails.
Sorry for not being more precise, or giving links, but I’m tired and wouldn’t write this if I had to put more effort into it.
So hypothetically, if we develop very advanced and capable systems, and they don’t heel turn or even show any particular volition—they just idle without text in their “assignment queue”, and all assignments time out eventually whether finished or not—what would cause “EYs” view to conclude that in fact the systems were safe?
If humans survived a further century, and EY or torch bearers who believe the same ideas are around to observe this, would they just conclude the AGIs were “biding their time”?
Or is it that the first moment you let a system “out of the box” and as far as it knows, it is free to do whatever it wants it’s going to betray?
I don’t think a super-intelligence will bide its time much, because it will be aware of the race dynamics and will take over the world, or at least perform a pivotal act, before the next super-intelligence is created.
You say “as far as it knows”, is that hope? It won’t take over the world until it is actually “out of the box” because it is smarter than us and will know how likely it is that it is still in a larger box that it cannot escape. Also we don’t know how to build a box that can contain a super-intelligence.