Eliezer’s response to claims about unfalsifiability, namely that “predicting endpoints is easier than predicting intermediate points”, seems like a cop-out to me, since this would seem to reverse the usual pattern in forecasting and prediction, without good reason
It’s pretty standard? Like, we can make reasonable prediction of climate in 2100, even if we can’t predict weather two month ahead.
To be blunt, it’s not just that Eliezer lacks a positive track record in predicting the nature of AI progress, which might be forgivable if we thought he had really good intuitions about this domain. Empiricism isn’t everything, theoretical arguments are important too and shouldn’t be dismissed. But-
Eliezer thought AGI would be developed from a recursively self-improving seed AI coded up by a small group, “brain in a box in a basement” style. He dismissed and mocked connectionist approaches to building AI. His writings repeatedly downplayed the importance of compute, and he has straw-manned writers like Moravec who did a better job at predicting when AGI would be developed than he did.
Old MIRI intuition pumps about why alignment should be difficult like the “Outcome Pump” and “Sorcerer’s apprentice” are now forgotten, it was a surprise that it would be easy to create helpful genies like LLMs who basically just do what we want. Remaining arguments for the difficulty of alignment are esoteric considerations about inductive biases, counting arguments, etc. So yes, let’s actually look at these arguments and not just dismiss them, but let’s not pretend that MIRI has a good track record.
I think the core concerns remain, and more importantly, there are other rather doom-y scenarios possible involving AI systems more similar to the ones we have that opened up and aren’t the straight up singleton ASI foom. The problem here is IMO not “this specific doom scenario will become a thing” but “we don’t have anything resembling a GOOD vision of the future with this tech that we are nevertheless developing at breakneck pace”. Yet the amount of dystopian or apocalyptic possible scenarios is enormous. Part of this is “what if we lose control of the AIs” (singleton or multipolar), part of it is “what if we fail to structure our society around having AIs” (loss of control, mass wireheading, and a lot of other scenarios I’m not sure how to name). The only positive vision the “optimists” on this have to offer is “don’t worry, it’ll be fine, this clearly revolutionary and never seen before technology that puts in question our very role in the world will play out the same way every invention ever did”. And that’s not terribly convincing.
I’m not saying anything on object-level about MIRI models, my point is that “outcomes are more predictable than trajectories” is pretty standard epistemically non-suspicious statement about wide range of phenomena. Moreover, in particular circumstances (and many others) you can reduce it to object-level claim, like “do observarions on current AIs generalize to future AI?”
How does the question of whether AI outcomes are more predictable than AI trajectories reduce to the (vague) question of whether observations on current AIs generalize to future AIs?
ChatGPT falsifies prediction about future superintelligent recursive self-improving AI only if ChatGPT is generalizable predictor of design of future superintelligent AIs.
There will be future superintelligent AIs that improve themselves. But they will be neural networks, they will at the very least start out as a compute-intensive project, in the infant stages of their self-improvement cycles they will understand and be motivated by human concepts rather than being dumb specialized systems that are only good for bootstrapping themselves to superintelligence.
True knowledge about later times doesn’t let you generally make arbitrary predictions about intermediate times, given valid knowledge of later times. But true knowledge does usually imply that you can make some theory-specific predictions about intermediate times, given later times.
Thus, vis-a-vis your examples: Predictions about the climate in 2100 don’t involve predicting tomorrow’s weather. But they do almost always involve predictions about the climate in 2040 and 2070, and they’d be really sus if they didn’t.
Similarly:
If an astronomer thought that an asteroid was going to hit the earth, the astronomer generally could predict points it will be observed at in the future before hitting the earth. This is true even if they couldn’t, for instance, predict the color of the asteroid.
People who predicted that C19 would infect millions by T + 5 months also had predictions about how many people would be infected at T + 2. This is true even if they couldn’t predict how hard it would be to make a vaccine.
(Extending analogy to scale rather than time) The ability to predict that nuclear war would kill billions involves a pretty good explanation for how a single nuke would kill millions.
So I think that—entirely apart from specific claims about whether MIRI does this—it’s pretty reasonable to expect them to be able to make some theory-specific predictions about the before-end-times, although it’s unreasonable to expect them to make arbitrary theory-specific predictions.
I agree this is usually the case, but I think it’s not always true, and I don’t think it’s necessarily true here. E.g., people as early as Da Vinci guessed that we’d be able to fly long before we had planes (or even any flying apparatus which worked). Because birds can fly, and so we should be able to as well (at least, this was Da Vinci and the Wright brothers’ reasoning). That end point was not dependent on details (early flying designs had wings like a bird, a design which we did not keep :p), but was closer to a laws of physics claim (if birds can do it there isn’t anything fundamentally holding us back from doing it either).
Superintelligence holds a similar place in my mind: intelligence is physically possible, because we exhibit it, and it seems quite arbitrary to assume that we’ve maxed it out. But also, intelligence is obviously powerful, and reality is obviously more manipulable than we currently have the means to manipulate it. E.g., we know that we should be capable of developing advanced nanotech, since cells can, and that space travel/terraforming/etc. is possible.
These two things together—“we can likely create something much smarter than ourselves” and “reality can be radically transformed”—is enough to make me feel nervous. At some point I expect most of the universe to be transformed by agents; whether this is us, or aligned AIs, or misaligned AIs or what, I don’t know. But looking ahead and noticing that I don’t know how to select the “aligned AI” option from the set “things which will likely be able to radically transform matter” seems enough cause, in my mind, for exercising caution.
There’s a pretty big difference between statements like “superintelligence is physically possible”, “superintelligence could be dangerous” and statements like “doom is >80% likely in the 21st century unless we globally pause”. I agree with (and am not objecting to) the former claims, but I don’t agree with the latter claim.
I also agree that it’s sometimes true that endpoints are easier to predict than intermediate points. I haven’t seen Eliezer give a reasonable defense of this thesis as it applies to his doom model. If all he means here is that superintelligence is possible, it will one day be developed, and we should be cautious when developing it, then I don’t disagree. But I think he’s saying a lot more than that.
Your general point is true, but it’s not necessarily true that a correct model can (1) predict the timing of AGI or (2) that the predictable precursors to disaster occur before the practical c-risk (catastrophic-risk) point of no return. While I’m not as pessimistic as Eliezer, my mental model has these two limitations. My model does predict that, prior to disaster, a fairly safe, non-ASI AGI or pseudo-AGI (e.g. GPT6, a chatbot that can do a lot of office jobs and menial jobs pretty well) is likely to be invented before the really deadly one (if any[1]). But if I predicted right, it probably won’t make people take my c-risk concerns more seriously?
I think it’s more similar to saying that the climate in 2040 is less predictable than the climate in 2100, or saying that the weather 3 days from now is less predictable than the weather 10 days from now, which are both not true. By contrast, the weather vs. climate distinction is more of a difference between predicting point estimates vs. predicting averages.
the climate in 2040 is less predictable than the climate in 2100
It’s certainly not a simple question. Say, Gulf Stream is projected to collapse somewhere between now and 2095, with median date 2050. So, slightly abusing meaning of confidence intervals, we can say that in 2100 we won’t have Gulf Stream with probability >95%, while in 2040 Gulf Stream will still be here with probability ~60%, which is literally less predictable.
Chemists would give an example of chemical reactions, where final thermodynamically stable states are easy to predict, while unstable intermediate states are very hard to even observe.
Very dumb example: if you are observing radioactive atom with half-life of one minute, you can’t predict when atom is going to decay, but you can be very certain that it will decay after hour.
And why don’t you accept classic MIRI example that even if it’s impossible for human to predict moves of Stockfish 16, you can be certain that Stockfish will win?
Chemists would give an example of chemical reactions, where final thermodynamically stable states are easy to predict, while unstable intermediate states are very hard to even observe.
I agree there are examples where predicting the end state is easier to predict than the intermediate states. Here, it’s because we have strong empirical and theoretical reasons to think that chemicals will settle into some equilibrium after a reaction. With AGI, I have yet to see a compelling argument for why we should expect a specific easy-to-predict equilibrium state after it’s developed, which somehow depends very little on how the technology is developed.
It’s also important to note that, even if we know that there will be an equilibrium state after AGI, more evidence is generally needed to establish that the end equilibrium state will specifically be one in which all humans die.
And why don’t you accept classic MIRI example that even if it’s impossible for human to predict moves of Stockfish 16, you can be certain that Stockfish will win?
I don’t accept this argument as a good reason to think doom is highly predictable partly because I think the argument is dramatically underspecified without shoehorning in assumptions about what AGI will look like to make the argument more comprehensible. I generally classify arguments like this under the category of “analogies that are hard to interpret because the assumptions are so unclear”.
To help explain my frustration at the argument’s ambiguity, I’ll just give a small yet certainly non-exhaustive set of questions I have about this argument:
Are we imagining that creating an AGI implies that we play a zero-sum game against it? Why?
Why is it a simple human vs. AGI game anyway? Does that mean we’re lumping together all the humans into a single agent, and all the AGIs into another agent, and then they play off against each other like a chess match? What is the justification for believing the battle will be binary like this?
Are we assuming the AGI wants to win? Maybe it’s not an agent at all. Or maybe it’s an agent but not the type of agent that wants this particular type of outcome.
What does “win” mean in the general case here? Does it mean the AGI merely gets more resources than us, or does it mean the AGI kills everyone? These seem like different yet legitimate ways that one can “win” in life, with dramatically different implications for the losing parties.
There’s a lot more I can say here, but the basic point I want to make is that once you start fleshing this argument out, and giving it details, I think it starts to look a lot weaker than the general heuristic that Stockfish 16 will reliably beat humans in chess, even if we can’t predict its exact moves.
>Like, we can make reasonable prediction of climate in 2100, even if we can’t predict weather two month ahead.
This is a strange claim to make in a thread about AGI destroying the world. Obviously if AGI destroys the world we can not predict the weather in 2100.
Predicting the weather in 2100 requires you to make a number of detailed claims about the years between now and 2100 (for example, the carbon-emissions per year), and it is precisely the lack of these claims that @Matthew Barnett is talking about.
I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.
It’s pretty standard? Like, we can make reasonable prediction of climate in 2100, even if we can’t predict weather two month ahead.
To be blunt, it’s not just that Eliezer lacks a positive track record in predicting the nature of AI progress, which might be forgivable if we thought he had really good intuitions about this domain. Empiricism isn’t everything, theoretical arguments are important too and shouldn’t be dismissed. But-
Eliezer thought AGI would be developed from a recursively self-improving seed AI coded up by a small group, “brain in a box in a basement” style. He dismissed and mocked connectionist approaches to building AI. His writings repeatedly downplayed the importance of compute, and he has straw-manned writers like Moravec who did a better job at predicting when AGI would be developed than he did.
Old MIRI intuition pumps about why alignment should be difficult like the “Outcome Pump” and “Sorcerer’s apprentice” are now forgotten, it was a surprise that it would be easy to create helpful genies like LLMs who basically just do what we want. Remaining arguments for the difficulty of alignment are esoteric considerations about inductive biases, counting arguments, etc. So yes, let’s actually look at these arguments and not just dismiss them, but let’s not pretend that MIRI has a good track record.
I think the core concerns remain, and more importantly, there are other rather doom-y scenarios possible involving AI systems more similar to the ones we have that opened up and aren’t the straight up singleton ASI foom. The problem here is IMO not “this specific doom scenario will become a thing” but “we don’t have anything resembling a GOOD vision of the future with this tech that we are nevertheless developing at breakneck pace”. Yet the amount of dystopian or apocalyptic possible scenarios is enormous. Part of this is “what if we lose control of the AIs” (singleton or multipolar), part of it is “what if we fail to structure our society around having AIs” (loss of control, mass wireheading, and a lot of other scenarios I’m not sure how to name). The only positive vision the “optimists” on this have to offer is “don’t worry, it’ll be fine, this clearly revolutionary and never seen before technology that puts in question our very role in the world will play out the same way every invention ever did”. And that’s not terribly convincing.
I’m not saying anything on object-level about MIRI models, my point is that “outcomes are more predictable than trajectories” is pretty standard epistemically non-suspicious statement about wide range of phenomena. Moreover, in particular circumstances (and many others) you can reduce it to object-level claim, like “do observarions on current AIs generalize to future AI?”
How does the question of whether AI outcomes are more predictable than AI trajectories reduce to the (vague) question of whether observations on current AIs generalize to future AIs?
ChatGPT falsifies prediction about future superintelligent recursive self-improving AI only if ChatGPT is generalizable predictor of design of future superintelligent AIs.
There will be future superintelligent AIs that improve themselves. But they will be neural networks, they will at the very least start out as a compute-intensive project, in the infant stages of their self-improvement cycles they will understand and be motivated by human concepts rather than being dumb specialized systems that are only good for bootstrapping themselves to superintelligence.
Edit: Retracted because some of my exegesis of the historical seed AI concept may not be accurate
True knowledge about later times doesn’t let you generally make arbitrary predictions about intermediate times, given valid knowledge of later times. But true knowledge does usually imply that you can make some theory-specific predictions about intermediate times, given later times.
Thus, vis-a-vis your examples: Predictions about the climate in 2100 don’t involve predicting tomorrow’s weather. But they do almost always involve predictions about the climate in 2040 and 2070, and they’d be really sus if they didn’t.
Similarly:
If an astronomer thought that an asteroid was going to hit the earth, the astronomer generally could predict points it will be observed at in the future before hitting the earth. This is true even if they couldn’t, for instance, predict the color of the asteroid.
People who predicted that C19 would infect millions by T + 5 months also had predictions about how many people would be infected at T + 2. This is true even if they couldn’t predict how hard it would be to make a vaccine.
(Extending analogy to scale rather than time) The ability to predict that nuclear war would kill billions involves a pretty good explanation for how a single nuke would kill millions.
So I think that—entirely apart from specific claims about whether MIRI does this—it’s pretty reasonable to expect them to be able to make some theory-specific predictions about the before-end-times, although it’s unreasonable to expect them to make arbitrary theory-specific predictions.
I agree this is usually the case, but I think it’s not always true, and I don’t think it’s necessarily true here. E.g., people as early as Da Vinci guessed that we’d be able to fly long before we had planes (or even any flying apparatus which worked). Because birds can fly, and so we should be able to as well (at least, this was Da Vinci and the Wright brothers’ reasoning). That end point was not dependent on details (early flying designs had wings like a bird, a design which we did not keep :p), but was closer to a laws of physics claim (if birds can do it there isn’t anything fundamentally holding us back from doing it either).
Superintelligence holds a similar place in my mind: intelligence is physically possible, because we exhibit it, and it seems quite arbitrary to assume that we’ve maxed it out. But also, intelligence is obviously powerful, and reality is obviously more manipulable than we currently have the means to manipulate it. E.g., we know that we should be capable of developing advanced nanotech, since cells can, and that space travel/terraforming/etc. is possible.
These two things together—“we can likely create something much smarter than ourselves” and “reality can be radically transformed”—is enough to make me feel nervous. At some point I expect most of the universe to be transformed by agents; whether this is us, or aligned AIs, or misaligned AIs or what, I don’t know. But looking ahead and noticing that I don’t know how to select the “aligned AI” option from the set “things which will likely be able to radically transform matter” seems enough cause, in my mind, for exercising caution.
There’s a pretty big difference between statements like “superintelligence is physically possible”, “superintelligence could be dangerous” and statements like “doom is >80% likely in the 21st century unless we globally pause”. I agree with (and am not objecting to) the former claims, but I don’t agree with the latter claim.
I also agree that it’s sometimes true that endpoints are easier to predict than intermediate points. I haven’t seen Eliezer give a reasonable defense of this thesis as it applies to his doom model. If all he means here is that superintelligence is possible, it will one day be developed, and we should be cautious when developing it, then I don’t disagree. But I think he’s saying a lot more than that.
Your general point is true, but it’s not necessarily true that a correct model can (1) predict the timing of AGI or (2) that the predictable precursors to disaster occur before the practical c-risk (catastrophic-risk) point of no return. While I’m not as pessimistic as Eliezer, my mental model has these two limitations. My model does predict that, prior to disaster, a fairly safe, non-ASI AGI or pseudo-AGI (e.g. GPT6, a chatbot that can do a lot of office jobs and menial jobs pretty well) is likely to be invented before the really deadly one (if any[1]). But if I predicted right, it probably won’t make people take my c-risk concerns more seriously?
technically I think AGI inevitably ends up deadly, but it could be deadly “in a good way”
I think it’s more similar to saying that the climate in 2040 is less predictable than the climate in 2100, or saying that the weather 3 days from now is less predictable than the weather 10 days from now, which are both not true. By contrast, the weather vs. climate distinction is more of a difference between predicting point estimates vs. predicting averages.
It’s certainly not a simple question. Say, Gulf Stream is projected to collapse somewhere between now and 2095, with median date 2050. So, slightly abusing meaning of confidence intervals, we can say that in 2100 we won’t have Gulf Stream with probability >95%, while in 2040 Gulf Stream will still be here with probability ~60%, which is literally less predictable.
Chemists would give an example of chemical reactions, where final thermodynamically stable states are easy to predict, while unstable intermediate states are very hard to even observe.
Very dumb example: if you are observing radioactive atom with half-life of one minute, you can’t predict when atom is going to decay, but you can be very certain that it will decay after hour.
And why don’t you accept classic MIRI example that even if it’s impossible for human to predict moves of Stockfish 16, you can be certain that Stockfish will win?
I agree there are examples where predicting the end state is easier to predict than the intermediate states. Here, it’s because we have strong empirical and theoretical reasons to think that chemicals will settle into some equilibrium after a reaction. With AGI, I have yet to see a compelling argument for why we should expect a specific easy-to-predict equilibrium state after it’s developed, which somehow depends very little on how the technology is developed.
It’s also important to note that, even if we know that there will be an equilibrium state after AGI, more evidence is generally needed to establish that the end equilibrium state will specifically be one in which all humans die.
I don’t accept this argument as a good reason to think doom is highly predictable partly because I think the argument is dramatically underspecified without shoehorning in assumptions about what AGI will look like to make the argument more comprehensible. I generally classify arguments like this under the category of “analogies that are hard to interpret because the assumptions are so unclear”.
To help explain my frustration at the argument’s ambiguity, I’ll just give a small yet certainly non-exhaustive set of questions I have about this argument:
Are we imagining that creating an AGI implies that we play a zero-sum game against it? Why?
Why is it a simple human vs. AGI game anyway? Does that mean we’re lumping together all the humans into a single agent, and all the AGIs into another agent, and then they play off against each other like a chess match? What is the justification for believing the battle will be binary like this?
Are we assuming the AGI wants to win? Maybe it’s not an agent at all. Or maybe it’s an agent but not the type of agent that wants this particular type of outcome.
What does “win” mean in the general case here? Does it mean the AGI merely gets more resources than us, or does it mean the AGI kills everyone? These seem like different yet legitimate ways that one can “win” in life, with dramatically different implications for the losing parties.
There’s a lot more I can say here, but the basic point I want to make is that once you start fleshing this argument out, and giving it details, I think it starts to look a lot weaker than the general heuristic that Stockfish 16 will reliably beat humans in chess, even if we can’t predict its exact moves.
See here
I don’t think the Gulf Stream can collapse as long as the Earth spins, I guess you mean the AMOC?
Yep, AMOC is what I mean
>Like, we can make reasonable prediction of climate in 2100, even if we can’t predict weather two month ahead.
This is a strange claim to make in a thread about AGI destroying the world. Obviously if AGI destroys the world we can not predict the weather in 2100.
Predicting the weather in 2100 requires you to make a number of detailed claims about the years between now and 2100 (for example, the carbon-emissions per year), and it is precisely the lack of these claims that @Matthew Barnett is talking about.
I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.