OK, so it sounds like Eliezer is saying that all of the following are very probable:
ML, mostly as presently practiced, can produce powerful, dangerous AGI much sooner than any other approach.
The number of technical innovations needed is limited, and they’re mostly relatively easy to think of.
Once those innovations get to a sort of base-level AGI, it can be scaled up to catastropic levels by throwing computing power at it.
ML-based AGI isn’t the “best” approach, and it may or may not be able to FOOM, but it will still have the ability and motivation to kill everybody or worse.
There’s no known way to make that kind of AGI behave well, and no promising approaches.
Lack of interpretability is a major cause of this.
There is a very low probability of anybody solving this problem before badly-behaved AGI has been created and has taken over.
Nonetheless, the main hope is still to try to build ML-based AGI which--
Behaves well
Is capable of either preventing other ML-based AGI from being created, or preventing it from behaving badly. Or at least of helping you to do those things.
I would think this would require a really major lead. Even finding other projects could be hard.
(3) has to be done while keeping the methods secret, because otherwise somebody will copy the intelligence part, or copy the whole thing before its behavior is under control, maybe add some minor tweaks of their own, and crank the scale knob to kill everybody.
Corollary: this almost impossible system has to be built by a small group, or by set of small “cells” with very limited communication. Years-long secrets are hard.
That group or cell system will have to compete with much larger, less constrained open efforts that are solving an easier problem. A vastly easier problem under assumptions (1) and (2).
A bunch of resources will necessarily get drained away from the main technical goal, toward managing that arms race.
(3) is almost impossible anyway and is made even harder by (4). Therefore we are almost certainly screwed.
Well. Um.
If that’s really the situation, then clinging to (3), at least as a primary approach, seems like a very bad strategy. “Dying gracefully” is not satisfying. It seems to me that--
People who don’t want to die on the main line should be doing, or planning, something other than trying to win an AGI race… like say flipping tables and trying to foment nuclear war or something.
That’s still not likely to work, and it might still only be a delaying tactic, but it still seems like a better, more achievable option than trying to control ultra-smart ML when you have no idea at all how to do that. If you can’t win, change the game.
and/or
People who feel forced to accept dying on the main line ought to be putting their resources into approaches that will work on other timelines, like say if ML turns out not to be able to get powerful enough to cause true doom.
If ML is really fated to win the race so soon and in such a total way, then people almost certainly can’t change the main line by rushing to bolt on a safety module. They might, however, be able to significantly change less probable time lines by doing something else involving other methods of getting to AGI. And the overall probability mass they can add to survival that way seems like it’s a lot more.
The main line is already lost, and it’s time to try to salvage what you can.
Personally, I don’t think I see that you can turn an ML system that has say 50-to-250 percent of a human’s intelligence into an existential threat just by pushing the “turbo” button on the hardware. Which means that I’m kind of hoping nobody goes the “nuclear war” route in real life.
I suspect that anything that gets to that point using ML will already be using a significant fraction of the compute available in the world. Being some small multiple of as good as a human isn’t going to let you build more or better hardware all that fast, especially without getting yourself shut down. And I don’t think you can invent working nanotech without spending a bunch of time in the lab unless you are already basically a god. Doing that in your head is definitely a post-FOOM project.
But I am far from an expert. If I’m wrong, which I very well could be, then it seems crazy to be trying to save the “main line” by trying to do something you don’t even have an approach for, when all that has to happen for you to fail is for somebody else to push that “turbo” button. That feels like following some script for heroically refusing to concede anything, instead of actually trying to grab all the probability you can realistically get.
People who don’t want to die on the main line should be doing, or planning, something other than trying to win an AGI race… like say flipping tables and trying to foment nuclear war or something.
How does fomenting nuclear war change anything? The basic logic for ‘let’s also question our assumptions and think about whether there’s some alternative option’ is sound (and I mostly like your decomposition of Eliezer’s view), but you do need to have the alternative plan actually end up solving the problem.
Specific proposals and counter-proposals (that chain all the way to ‘awesome intergalactic future’) are likely the best way to unearth cruxes and figure out what makes sense to do here. Just saying ‘let’s consider third options’ or ‘let’s flip the tables somehow’ won’t dissuade Eliezer because it’s not a specific promising-looking plan (and he thinks he’s already ruled out enough plans like this to make them even doomier than AGI-alignment-mediated plans).
Eliezer is saying ‘there isn’t an obvious path forward, so we should figure out how to best take advantage of future scenarios where there are positive model violations (“miracles”)‘; he’s not saying ‘we’re definitely doomed, let’s give up’. If you agree but think that something else gives us better/likelier miracles than trying to align AGI, then that’s a good place to focus discussion.
One reason I think Eliezer tends to be unpersuaded by alternatives to alignment is that they tend to delay the problem without solving it. Another reason is that Eliezer thinks AGI and alignment are to a large extent unknown quantities, which gives us more reason to expect positive model violations; e.g., “maybe if X happened the world’s strongest governments would suddenly set aside their differences and join in harmony to try to handle this issue in a reasonable way” also depends on positive violations of Eliezer’s model, but they’re violations of generalizations that have enormously more supporting data.
We don’t know much about how early AGI systems tend to work, or how alignment tends to work; but we know an awful lot about how human governments (and human groups, and human minds) tend to work.
Nuclear war was just an off-the-top example meant to illustrate how far you might want to go. And I did admit that it would probably basically be a delaying tactic.
If I thought ML was as likely to “go X-risk” as Eliezer seems to, then I personally would want to go for the “grab probability on timelines other than what you think of as the main one” approach, not the “nuclear war” approach. And obviously I wouldn’t treat nuclear war as the first option for flipping tables… but just as obviously I can’t come up with a better way to flip tables off the top of my head.
If you did the nuclear war right, you might get hundreds or thousands of years of delay, with about the same probability [edit: meant to say “higher probability” but still indicate that it was low in absolute terms] that I (and I think Eliezer) give to your being able to control[1] ML-based AGI. That’s not nothing. But the real point is that if you don’t think there’s a way to “flip tables”, then you’re better off just conceding the “main line” and trying to save other possibilities, even if they’re much less probable.
I don’t like the word “alignment”. It admits too many dangerous associations and interpretations. It doesn’t require them, but I think it carries a risk of distorting one’s thoughts.
I think there are some ways of flipping tables that offer some hope (albeit a longshot) of actually getting us into a better position to solve the problem, rather than just delaying the issue. Basically, strategies for suppressing or controlling Earth’s supply of compute, while pressing for differential tech development on things like BCIs, brain emulation, human intelligence enhancement, etc, plus (if you can really buy lots of time) searching for alternate, easier-to-align AGI paradigms, and making improvements to social technology / institutional decisionmaking (prediction markets, voting systems, etc).
I would write more about this, but I’m not sure if MIRI / LessWrong / etc want to encourage lots of public speculation about potentially divisive AGI “nonpharmaceutical interventions” like fomenting nuclear war. I think it’s an understandably sensitive area, which people would prefer to discuss privately.
If discussed privately, that can also lead to pretty horrific scenarios where a small group of people do something incredibly dumb/dangerous without having outside voices to pull them away from such actions if sufficiently risky. Not sure if there is any “good” way to discuss such topics, though…
If I thought ML was as likely to “go X-risk” as Eliezer seems to, then I personally would want to go for the “grab probability on timelines other than what you think of as the main one” approach
I’m not sure what you mean by “grab probability on timelines” here. I think you mean something like ‘since the mainline looks doomy, try to increase P(success) in non-mainline scenarios’.
Which sounds similar to the Eliezer-strategy, except Eliezer seems to think the most promising non-mainline scenarios are different from the ones you’re thinking about. Possibly there’s also a disagreement here related to ‘Eliezer thinks there are enough different miracle-possibilities (each of which is sufficiently low-probability) that it doesn’t make sense to focus in on one of them.’
There’s a different thing you could mean by ‘grab probability on timelines other than what you think of as the main one’, which I don’t think was your meaning, that’s something like: assuming things go well, AGI is probably further in the future than Eliezer thinks. So it makes sense to focus at least somewhat more on longer-timeline scenarios, while keeping in mind that AGI probably isn’t in fact that far off.
I think MIRI leadership would endorse ‘if things went well, AGI timelines were probably surprising long’.
I think you mean something like ‘since the mainline looks doomy, try to increase P(success) in non-mainline scenarios’.
Yes, that’s basically right.
I didn’t bring up the “main line”, and I thought I was doing a pretty credible job of following the metaphor.
Take a simplified model where a final outcome can only be “good” (mostly doom-free) or “bad” (very rich in doom). There will be a single “winning” AGI, which will simply be the first to cross some threshold of capability. This cannot be permanently avoided. The winning AGI will completely determine whether the outcome is good or bad. We’ll call a friendly-aligned-safe-or-whatever AGI that creates a good outcome a “good AGI”, and one that creates a bad outcome a “bad AGI”. A randomly chosen AGI will be bad with probability 0.999.
You want to influence the creation of the winning AGI to make sure it’s a good one. You have certain finite resources to apply to that: time, attention, intelligence, influence, money, whatever.
Suppose that you think that there’s a 0.75 probability that something more or less like current ML systems will win (that’s the “ML timeline” and presumptively the “main line”). Unfortunately, you also believe that there’s only 0.05 probability that there’s any path at all to find a way for an AGI with an “ML architecture” to be good, within whatever time it takes for ML to win (probably there’s some correlation between how long it takes ML to win and how long it takes out to figure out how to make it good). Again, that’s the probability that it’s possible in the abstract to invent good ML in the available time, not the probability that it will actually be invented and get deployed.
Contingent on the ML-based approach winning, and assuming you don’t do anything yourself, you think there’s maybe a 0.01 probability that somebody else will actually arrange for a the winning AGI to be good. You’re damned good, so if you dump all of your attention and resources into it, you can double that to 0.02 even though lots of other people are working on ML safety. So you would contribute 0.01 times 0.75 or 0.0075 probability to a good outcome. Or at least you hope you would; you do not at this time have any idea how to actually go about it.
Now suppose that there’s some other AGI approach, call it X. X could also be a family of approaches. You think that X has, say, 0.1 probability of actually winning instead of ML (which leaves 0.15 for outcomes that are neither X nor ML). But you think that X is more tractable than ML; there’s a 0.75 probability that X can in principle be made good before it wins.
Contingent on X winning, there’s a 0.1 probability that somebody else will arrange for X to be good without you. But at the moment everybody is working on ML, which gives you runway to work on X before capability on the X track starts to rise. So with all of your resources, you could really increase the overall attention being paid to X, and raise that to 0.3. You would then have contributed 0.2 times 0.1 or 0.02 probability to a good outcome. And you have at least a vague idea how to make progress on the problem, which is going to be good for your morale.
Or maybe there’s a Y that only has a 0.05 probability of winning, but you have some nifty and unique idea that you think has a 0.9 probability of making Y good, so you can get nearly 0.045 even though Y is itself an unlikely winner.
Obviously these are sensitive to the particular probabilities you assign, and I am not really very well placed to assign such probabilities, but my intuition is that there are going to be productive Xs and Ys out there.
I may be biased by the fact that, to whatever degree I can assign priorities, I think that ML’s probability of winning, in the very manichean sense I’ve set up here, where it remakes the whole world, is more like 0.25 than 0.75. But even if it’s 0.75, which I suspect is closer to what Eliezer thinks (and would be most of his “0.85 by 2070”), ML is still handicapped by there not being any obvious way to apply resources to it.
Sure, you can split your resources. And that might make sense if there’s low hanging fruit on one or more branches. But I didn’t see anything in that transcript that suggested doing that. And you would still want to put the most resources on the most productive paths, rather than concentrating on a moon shot to fix ML when that doesn’t seem doable.
Personally, I don’t think I see that you can turn an ML system that has say 50-to-250 percent of a human’s intelligence into an existential threat just by pushing the “turbo” button on the hardware. Which means that I’m kind of hoping nobody goes the “nuclear war” route in real life.
Isn’t that part somewhat tautological? A sufficiently large group of humans is basically a superintelligence. We’ve basically terraformed the earth with cows and rice and cities and such.
A computer with >100% human intelligence would be incredibly economically valuable (automate every job that can be done remotely?), so it seems very likely that people would make huge numbers of copies if the cost of running one was less than compensation for a human doing the same job.
And that’s basically your superintelligence: even if a low-level AGI can’t directly self-improve (which seems somewhat doubtful since humans are currently improving computers at a reasonably fast rate), it could still reach superintelligence by being scaled from 1 to N.
OK, so it sounds like Eliezer is saying that all of the following are very probable:
ML, mostly as presently practiced, can produce powerful, dangerous AGI much sooner than any other approach.
The number of technical innovations needed is limited, and they’re mostly relatively easy to think of.
Once those innovations get to a sort of base-level AGI, it can be scaled up to catastropic levels by throwing computing power at it.
ML-based AGI isn’t the “best” approach, and it may or may not be able to FOOM, but it will still have the ability and motivation to kill everybody or worse.
There’s no known way to make that kind of AGI behave well, and no promising approaches.
Lack of interpretability is a major cause of this.
There is a very low probability of anybody solving this problem before badly-behaved AGI has been created and has taken over.
Nonetheless, the main hope is still to try to build ML-based AGI which--
Behaves well
Is capable of either preventing other ML-based AGI from being created, or preventing it from behaving badly. Or at least of helping you to do those things.
I would think this would require a really major lead. Even finding other projects could be hard.
(3) has to be done while keeping the methods secret, because otherwise somebody will copy the intelligence part, or copy the whole thing before its behavior is under control, maybe add some minor tweaks of their own, and crank the scale knob to kill everybody.
Corollary: this almost impossible system has to be built by a small group, or by set of small “cells” with very limited communication. Years-long secrets are hard.
That group or cell system will have to compete with much larger, less constrained open efforts that are solving an easier problem. A vastly easier problem under assumptions (1) and (2).
A bunch of resources will necessarily get drained away from the main technical goal, toward managing that arms race.
(3) is almost impossible anyway and is made even harder by (4). Therefore we are almost certainly screwed.
Well. Um.
If that’s really the situation, then clinging to (3), at least as a primary approach, seems like a very bad strategy. “Dying gracefully” is not satisfying. It seems to me that--
People who don’t want to die on the main line should be doing, or planning, something other than trying to win an AGI race… like say flipping tables and trying to foment nuclear war or something.
That’s still not likely to work, and it might still only be a delaying tactic, but it still seems like a better, more achievable option than trying to control ultra-smart ML when you have no idea at all how to do that. If you can’t win, change the game.
and/or
People who feel forced to accept dying on the main line ought to be putting their resources into approaches that will work on other timelines, like say if ML turns out not to be able to get powerful enough to cause true doom.
If ML is really fated to win the race so soon and in such a total way, then people almost certainly can’t change the main line by rushing to bolt on a safety module. They might, however, be able to significantly change less probable time lines by doing something else involving other methods of getting to AGI. And the overall probability mass they can add to survival that way seems like it’s a lot more.
The main line is already lost, and it’s time to try to salvage what you can.
Personally, I don’t think I see that you can turn an ML system that has say 50-to-250 percent of a human’s intelligence into an existential threat just by pushing the “turbo” button on the hardware. Which means that I’m kind of hoping nobody goes the “nuclear war” route in real life.
I suspect that anything that gets to that point using ML will already be using a significant fraction of the compute available in the world. Being some small multiple of as good as a human isn’t going to let you build more or better hardware all that fast, especially without getting yourself shut down. And I don’t think you can invent working nanotech without spending a bunch of time in the lab unless you are already basically a god. Doing that in your head is definitely a post-FOOM project.
But I am far from an expert. If I’m wrong, which I very well could be, then it seems crazy to be trying to save the “main line” by trying to do something you don’t even have an approach for, when all that has to happen for you to fail is for somebody else to push that “turbo” button. That feels like following some script for heroically refusing to concede anything, instead of actually trying to grab all the probability you can realistically get.
How does fomenting nuclear war change anything? The basic logic for ‘let’s also question our assumptions and think about whether there’s some alternative option’ is sound (and I mostly like your decomposition of Eliezer’s view), but you do need to have the alternative plan actually end up solving the problem.
Specific proposals and counter-proposals (that chain all the way to ‘awesome intergalactic future’) are likely the best way to unearth cruxes and figure out what makes sense to do here. Just saying ‘let’s consider third options’ or ‘let’s flip the tables somehow’ won’t dissuade Eliezer because it’s not a specific promising-looking plan (and he thinks he’s already ruled out enough plans like this to make them even doomier than AGI-alignment-mediated plans).
Eliezer is saying ‘there isn’t an obvious path forward, so we should figure out how to best take advantage of future scenarios where there are positive model violations (“miracles”)‘; he’s not saying ‘we’re definitely doomed, let’s give up’. If you agree but think that something else gives us better/likelier miracles than trying to align AGI, then that’s a good place to focus discussion.
One reason I think Eliezer tends to be unpersuaded by alternatives to alignment is that they tend to delay the problem without solving it. Another reason is that Eliezer thinks AGI and alignment are to a large extent unknown quantities, which gives us more reason to expect positive model violations; e.g., “maybe if X happened the world’s strongest governments would suddenly set aside their differences and join in harmony to try to handle this issue in a reasonable way” also depends on positive violations of Eliezer’s model, but they’re violations of generalizations that have enormously more supporting data.
We don’t know much about how early AGI systems tend to work, or how alignment tends to work; but we know an awful lot about how human governments (and human groups, and human minds) tend to work.
Nuclear war was just an off-the-top example meant to illustrate how far you might want to go. And I did admit that it would probably basically be a delaying tactic.
If I thought ML was as likely to “go X-risk” as Eliezer seems to, then I personally would want to go for the “grab probability on timelines other than what you think of as the main one” approach, not the “nuclear war” approach. And obviously I wouldn’t treat nuclear war as the first option for flipping tables… but just as obviously I can’t come up with a better way to flip tables off the top of my head.
If you did the nuclear war right, you might get hundreds or thousands of years of delay, with about the same probability [edit: meant to say “higher probability” but still indicate that it was low in absolute terms] that I (and I think Eliezer) give to your being able to control[1] ML-based AGI. That’s not nothing. But the real point is that if you don’t think there’s a way to “flip tables”, then you’re better off just conceding the “main line” and trying to save other possibilities, even if they’re much less probable.
I don’t like the word “alignment”. It admits too many dangerous associations and interpretations. It doesn’t require them, but I think it carries a risk of distorting one’s thoughts.
I think there are some ways of flipping tables that offer some hope (albeit a longshot) of actually getting us into a better position to solve the problem, rather than just delaying the issue. Basically, strategies for suppressing or controlling Earth’s supply of compute, while pressing for differential tech development on things like BCIs, brain emulation, human intelligence enhancement, etc, plus (if you can really buy lots of time) searching for alternate, easier-to-align AGI paradigms, and making improvements to social technology / institutional decisionmaking (prediction markets, voting systems, etc).
I would write more about this, but I’m not sure if MIRI / LessWrong / etc want to encourage lots of public speculation about potentially divisive AGI “nonpharmaceutical interventions” like fomenting nuclear war. I think it’s an understandably sensitive area, which people would prefer to discuss privately.
If discussed privately, that can also lead to pretty horrific scenarios where a small group of people do something incredibly dumb/dangerous without having outside voices to pull them away from such actions if sufficiently risky. Not sure if there is any “good” way to discuss such topics, though…
I’m not sure what you mean by “grab probability on timelines” here. I think you mean something like ‘since the mainline looks doomy, try to increase P(success) in non-mainline scenarios’.
Which sounds similar to the Eliezer-strategy, except Eliezer seems to think the most promising non-mainline scenarios are different from the ones you’re thinking about. Possibly there’s also a disagreement here related to ‘Eliezer thinks there are enough different miracle-possibilities (each of which is sufficiently low-probability) that it doesn’t make sense to focus in on one of them.’
There’s a different thing you could mean by ‘grab probability on timelines other than what you think of as the main one’, which I don’t think was your meaning, that’s something like: assuming things go well, AGI is probably further in the future than Eliezer thinks. So it makes sense to focus at least somewhat more on longer-timeline scenarios, while keeping in mind that AGI probably isn’t in fact that far off.
I think MIRI leadership would endorse ‘if things went well, AGI timelines were probably surprising long’.
Yes, that’s basically right.
I didn’t bring up the “main line”, and I thought I was doing a pretty credible job of following the metaphor.
Take a simplified model where a final outcome can only be “good” (mostly doom-free) or “bad” (very rich in doom). There will be a single “winning” AGI, which will simply be the first to cross some threshold of capability. This cannot be permanently avoided. The winning AGI will completely determine whether the outcome is good or bad. We’ll call a friendly-aligned-safe-or-whatever AGI that creates a good outcome a “good AGI”, and one that creates a bad outcome a “bad AGI”. A randomly chosen AGI will be bad with probability 0.999.
You want to influence the creation of the winning AGI to make sure it’s a good one. You have certain finite resources to apply to that: time, attention, intelligence, influence, money, whatever.
Suppose that you think that there’s a 0.75 probability that something more or less like current ML systems will win (that’s the “ML timeline” and presumptively the “main line”). Unfortunately, you also believe that there’s only 0.05 probability that there’s any path at all to find a way for an AGI with an “ML architecture” to be good, within whatever time it takes for ML to win (probably there’s some correlation between how long it takes ML to win and how long it takes out to figure out how to make it good). Again, that’s the probability that it’s possible in the abstract to invent good ML in the available time, not the probability that it will actually be invented and get deployed.
Contingent on the ML-based approach winning, and assuming you don’t do anything yourself, you think there’s maybe a 0.01 probability that somebody else will actually arrange for a the winning AGI to be good. You’re damned good, so if you dump all of your attention and resources into it, you can double that to 0.02 even though lots of other people are working on ML safety. So you would contribute 0.01 times 0.75 or 0.0075 probability to a good outcome. Or at least you hope you would; you do not at this time have any idea how to actually go about it.
Now suppose that there’s some other AGI approach, call it X. X could also be a family of approaches. You think that X has, say, 0.1 probability of actually winning instead of ML (which leaves 0.15 for outcomes that are neither X nor ML). But you think that X is more tractable than ML; there’s a 0.75 probability that X can in principle be made good before it wins.
Contingent on X winning, there’s a 0.1 probability that somebody else will arrange for X to be good without you. But at the moment everybody is working on ML, which gives you runway to work on X before capability on the X track starts to rise. So with all of your resources, you could really increase the overall attention being paid to X, and raise that to 0.3. You would then have contributed 0.2 times 0.1 or 0.02 probability to a good outcome. And you have at least a vague idea how to make progress on the problem, which is going to be good for your morale.
Or maybe there’s a Y that only has a 0.05 probability of winning, but you have some nifty and unique idea that you think has a 0.9 probability of making Y good, so you can get nearly 0.045 even though Y is itself an unlikely winner.
Obviously these are sensitive to the particular probabilities you assign, and I am not really very well placed to assign such probabilities, but my intuition is that there are going to be productive Xs and Ys out there.
I may be biased by the fact that, to whatever degree I can assign priorities, I think that ML’s probability of winning, in the very manichean sense I’ve set up here, where it remakes the whole world, is more like 0.25 than 0.75. But even if it’s 0.75, which I suspect is closer to what Eliezer thinks (and would be most of his “0.85 by 2070”), ML is still handicapped by there not being any obvious way to apply resources to it.
Sure, you can split your resources. And that might make sense if there’s low hanging fruit on one or more branches. But I didn’t see anything in that transcript that suggested doing that. And you would still want to put the most resources on the most productive paths, rather than concentrating on a moon shot to fix ML when that doesn’t seem doable.
0.85 by 2070 was Nate Soares’ probability, not Eliezer’s.
Isn’t that part somewhat tautological? A sufficiently large group of humans is basically a superintelligence. We’ve basically terraformed the earth with cows and rice and cities and such.
A computer with >100% human intelligence would be incredibly economically valuable (automate every job that can be done remotely?), so it seems very likely that people would make huge numbers of copies if the cost of running one was less than compensation for a human doing the same job.
And that’s basically your superintelligence: even if a low-level AGI can’t directly self-improve (which seems somewhat doubtful since humans are currently improving computers at a reasonably fast rate), it could still reach superintelligence by being scaled from 1 to N.