Other questions I’d like to see better answered in clear, compact, intuition-pumping articles like the above:
a. Is enough work being done on FAI? Are AI researchers in general too dismissive or blasé about safety concerns?
b. Why should we expect hard takeoff? More generally, why can’t we wait until AGI is clearly about to be invented before working on safety?
c. Why is recursive self-improvement such an important threshold? How can we be confident humans have drastically suboptimal intelligence, and that massively superhuman intelligence optimization is possible without a massively superhuman initial hardware investment?
d. What’s the most plausible scenario for how an intelligence explosion will initially go? Why?
e. What are the biggest open problems for FAI research?
f. For present purposes, what is a preference? Why do we bracket the possibility of superintelligent irrationality? Why do AGIs tend toward temporally stable preferences?
g. Why is AGI ‘hungry’, i.e., desirous of unlimited resources? Why do we expect FAI and UFAI to have galaxy- or universe-wide effects?
k. Why is it hard to give AGI known or predictable goals?
l. What makes value fragile and complex? What are some plausible horror stories if we get things slightly wrong?
I’ve seen most of these addressed, but awkwardly, abstractly, or in the context of much longer texts. If you know of especially good stand-alone posts covering these issues, or would like to collaborate on making or synthesizing one, let me know!
(Feel free to suggest borderline options. I’ve probably missed some great candidates, and they can always serve as templates or raw materials for future posts.)
I agree that should be on the list. It’s a hard question to answer without lots of time and technical detail, though, which is part of why I went with making the problem seem more vivid and immediate by indirect means. Short of internalizing Cognitive Biases Affecting Judgment of Global Risks or a lot of hard sci-fi, I’m not sure there’s any good way to short-circuit people’s intuitions that FAI doesn’t feel like an imminent risk.
‘We really don’t know, but it wouldn’t be a huge surprise if it happened this century, and it would be surprising if it doesn’t happen in the next 300 years’ is I think a solid mainstream position. For the purpose of the Core List it might be better handled as a quick 2-3 paragraph overview in a longer (say, 3-page) article answering ‘Why is FAI fiercely urgent?’; Luke’s When Will AI Be Created? is, I think, a good choice for the Further Reading section.
I’d caution about conflating FOOM with AGI risks, or even with hungry AGI. Hard takeoff is the most obviously risky situation from an AGI safety perspective because it would be impossible to react to an unfriendly AGI without possible world-ending consequences, and is very compelling in story-telling, but a soft takeoff situation is not particularly likely to be safe just because someone /could/ turn off its supplies if no one would be able to tell that such a concern is necessary. An AI that’s only several dozen times smarter than humans probably won’t crack the protein folding problem, but it’s probably smart enough to navigate corporate red tape or manipulate a team of researchers.
g. Why is AGI ‘hungry’, i.e., desirous of unlimited resources? Why do we expect FAI and UFAI to have galaxy- or universe-wide effects?
Comparative and Absolute Advantage in AI demonstrates the simple lies-to-children version rather easily. The strong case would be the Riemann Hypothesis Catastrophe, but I’ve not seen any particularly good writeups of that even as it seems to be local jargon.
h. What’s a reasonable (as non-squick goes) fun interstellar growth scenario? Why is the future so important?
Just Another Day In Utopia is a little overly gamified and surreal, but generally pleasant. While intended as subtle horror, Iceman’s Friendship Is Optimal is about the right mix of ‘almost right’ that it almost seems like the ponies are added just to explain why someone might resist the Singularity—it’s only on deeper analysis that the value-of-your-values metric falls apart. The full text is too long for this sort of work, but excerpts from the work could be informative enough
j. What would a meta-ethical discourse between Clippy and a human look like?
Mu? Or moo, depending on severity.
k. Why is it hard to give AGI known or predictable goals?
Lost Purposes is a little shallow and not terribly well-researched, but it’s a compelling example of subgoal stomp and value drift that’s likely to be particularly applicable in an educational environment. And, of course, Magical Categories, which you’ve already linked, is rather illustrative on its own.
l. What makes value fragile and complex? What are some plausible horror stories if we get things slightly wrong?
I’d caution about conflating FOOM with AGI risks, or even with hungry AGI.
Which things I said made you worry I was conflating these? If hard and soft takeoff were equally likely I’d still think we should put most of our energy into worrying about hard takeoff; but hard seems significantly likelier to me.
Do you think non-hungry AGIs are likely? And if we build one, do you think it’s likely to be dangerous? When I imagine a non-hungry AGI, I imagine a simple behavior-executer like the blue-minimizing robot. It optimizes for something self-involving, like a deontologist rather than a consequentialist. It puts all its resources into, say, increasing the probability that it will pick up a hammer and then place it back down, at this exact moment; but it doesn’t try to tile the solar system with copies of itself picking up copies of the hammer, because it doesn’t think the copies would have value. It’s just stuck in a loop executing an action, and an action too feeble to add up to anything interstellar.
An AI that’s only several dozen times smarter than humans probably won’t crack the protein folding problem
Iceman’s Friendship Is Optimal is about the right mix of ‘almost right’
I suspect non-MLP fans will miss most of what makes such a future inspiring or fun. And inspiring and fun is what I’m shooting for.
Lost Purposes
This is one of the articles I was closest to adding to the list, because I feel more poetry/rhetoric is needed to seal the deal. But it’s a bit too long relative to how transparent its points of relevance aren’t.
But for ‘Why is it hard to give AGI known or predictable goals?’ what I had in mind is a popularization of the problems with evolutionary algorithms, or the relevance of Löb’s Theorem to self-modifying AI, or some combination of these and other concerns.
Which things I said made you worry I was conflating these? If hard and soft takeoff were equally likely I’d still think we should put most of our energy into worrying about hard takeoff; but hard seems significantly likelier to me.
At least from my readings, points 11, 12, and 13 are the big focus points on AGI risks, and they’re defaulting to genie-level capabilities: the only earlier machine is purely instruction-set blue-minimizing robot.
Hard takeoff being significantly more likely means that your concerns are, naturally and reasonably, going to gravitate toward discussing AGI risks and hungry AGI in the context of FOOMing AGI. That makes sense for people who can jump the inferential difference into explosive recursive improvement. If you’re writing a work to help /others understand/ the concept of AGI risks, though, discussing how a FOOMing AGI could start taking apart Jupiter in order to make more smiley faces, due next Tuesday, requires that they accept a more complex scenario than that of a general machine intelligence to begin with. This makes sense from a risk analysis viewpoint, where Bayesian multiplication is vital for comparing relative risks—very important to the values of SIRI, targeting folk who know what a Singularity is. It’s unnecessary for the purpose of risk awareness, where showing the simplest threshold risk gets folk to pay attention—which is more important to the MIRI, targeting folk who want to know what machine intelligence could be (and are narrative thinkers, with the resulting logical biases).
If the possibility of strong AGI occurring is P1, the probability of strong AGI going FOOM is P2, and probability of any strong AGI being destructive is P3, the necessary understanding to grasp P1xP2xP3 is unavoidably going to be higher than P1xP3, even if P2 is very close to 1. You can always introduce P2 later, in order to show why the results would be much worse than everyone’s already expecting—and that has a stronger effect on avoidance-heavy human neurology than letting people think that machine intelligence can be made safe by just preventing the AGI from reaching high levels of self-improvement.
If there are serious existential risks to soft and takeoff and even no takeoff AGI, then discussing a general risk first not only appears more serious, but also makes later discussion of hard takeoff hit even harder.
Do you think non-hungry AGIs are likely? And if we build one, do you think it’s likely to be dangerous?
Hungry AGIs occur when the utility of additional resources exceeds the costs of additional resources, as amortorized by whatever time discounting function you’re using. That’s very likely as the AGI calculates a sufficiently long-duration event, even with heavy time discounting, but that’s not the full set of possible minds. It’s quite easy to imagine a non-hungry AGI that causes civilization-level risks, or even a non-hungry non-FOOM AGI that causes continent-level risks. ((I don’t think it’s terribly likely, since barring exceptional information control or unlikely design constraints, it’d be bypassed by a copy turned intentionally-hungry AGI, but as above, this is a risk awareness matter rather than risk analysis one.))
More importantly, you don’t need to FOOM to have a hungry AGI. A ‘stupid’ tool AI, even a ‘stupid’ tool AI that gets only small benefits from additional resources, could still go hungry with the wrong question or the wrong discount on future time—or even if it merely made a bad time estimation on a normal question. It’s bad to have a few kilotons of computronium pave over the galaxy with smiley faces; it’s /embarrassing/ to have the solar system paved over with inefficient transistors trying to find a short answer to Fermat’s Last Theorem. Or if I’m wrong, and a machine intelligent slightly smarter than the chess team at MIT can crack the protein folding problem in a year, a blue-minimizing AGI becomes /very/ frightening even with a small total intelligence.
Ever? For how long?
The strict version of the protein folding prediction problem was defined about half a century ago, and has been a fairly well-known and well-studied problem enough that I’m willing to wager we’ve had several-dozen intelligent people working on it for most of that time period (and, recently, several-dozen intelligent people working on just software implementations). An AGI built today has the advantage of their research, along with a different neurological design, but in turn it may have additional limitations. Predictions are hard, especially about the future, but for the purposes of a thought experiment it’s not obvious that another fifty years without an AGI would change the matter so dramatically. I suspect /That Alien Message/ discusses a boxed AI with the sum computational power of the entire planet across long periods of time precisely because I’m not the only one to give that estimate.
And, honestly, once you have an AGI in the field, fifty years is a very long event horizon for even the slow takeoff scenarios.
I suspect non-MLP fans will miss most of what makes such a future inspiring or fun. And inspiring and fun is what I’m shooting for.
Not as much as you’d expect. It’s more calling on the sort of things that get folk interested in The Sims or in World of Warcraft, and iceman seemed to intentionally write it to be accessible to the general audience in preference to pony fans. The big benefit about ponies is that they’re strange enough that it’s someone /else’s/ wish fulfillment. ((Conversely, it doesn’t really benefit from knowledge of the show, since it doesn’t use the main cast or default setting: Celest-AI shares very little overlap with the character Princess Celestia, excepting that they can control a sun.)) The full work is probably not useful for this, but chapter six alone might be a useful contrast to / Just Another Day in Utopia/.
But for ‘Why is it hard to give AGI known or predictable goals?’ what I had in mind is a popularization of the problems with evolutionary algorithms, or the relevance of Löb’s Theorem to self-modifying AI, or some combination of these and other concerns.
Hm… that would be a tricky requirement to fill: there are very few good layperson’s versions of Löb’s Problem as it is, and the question does not easily reduce from the mathematic analysis. (EDIT: Or rather, it goes from being formal logic Deep Magic to obvious truism in attempts to demonstrate it… still, space to improve on the matter after that cartoon.)
I agree, but there’s a certain standard story we tend to tell, not so much because we’re certain it’s the initial trajectory as because it helps make the risks more concrete and vivid. To cite the most recent instance of this meme:
The notion of a ‘superintelligence’ is not that it sits around in Goldman Sachs’s basement trading stocks for its corporate masters. The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then, rather than bothering with the digital counters that humans call money, the superintelligence solves the protein structure prediction problem, emails some DNA sequences to online peptide synthesis labs, and gets back a batch of proteins which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then… well, it doesn’t really matter from our perspective what comes after that, because from a human perspective any technology more advanced than molecular nanotech is just overkill. A superintelligence with molecular nanotech does not wait for you to buy things from it in order for it to acquire money. It just moves atoms around into whatever molecular structures or large-scale structures it wants. [...]
And then with respect to very advanced AI, the sort that might be produced by AI self-improving and going FOOM, asking about the effect of machine superintelligence on the conventional human labor market is like asking how US-Chinese trade patterns would be affected by the Moon crashing into the Earth. There would indeed be effects, but you’d be missing the point.
What I was looking for is just this standard story, or something similarly plausible and specific, fleshed out and briefly defended in its own post, as a way of using narrative to change people’s go-to envisioned scenario to something fast and brutal.
Other questions I’d like to see better answered in clear, compact, intuition-pumping articles like the above:
a. Is enough work being done on FAI? Are AI researchers in general too dismissive or blasé about safety concerns?
b. Why should we expect hard takeoff? More generally, why can’t we wait until AGI is clearly about to be invented before working on safety?
c. Why is recursive self-improvement such an important threshold? How can we be confident humans have drastically suboptimal intelligence, and that massively superhuman intelligence optimization is possible without a massively superhuman initial hardware investment?
d. What’s the most plausible scenario for how an intelligence explosion will initially go? Why?
e. What are the biggest open problems for FAI research?
f. For present purposes, what is a preference? Why do we bracket the possibility of superintelligent irrationality? Why do AGIs tend toward temporally stable preferences?
g. Why is AGI ‘hungry’, i.e., desirous of unlimited resources? Why do we expect FAI and UFAI to have galaxy- or universe-wide effects?
h. What’s a reasonable (as non-squick goes) fun interstellar growth scenario? Why is the future so important?
i. What is the Prisoner’s Dilemma, and why is it relevant to FAI? How does it generalize to non-sentient prisoners?
j. What would a meta-ethical discourse between Clippy and a human look like?
k. Why is it hard to give AGI known or predictable goals?
l. What makes value fragile and complex? What are some plausible horror stories if we get things slightly wrong?
I’ve seen most of these addressed, but awkwardly, abstractly, or in the context of much longer texts. If you know of especially good stand-alone posts covering these issues, or would like to collaborate on making or synthesizing one, let me know!
(Feel free to suggest borderline options. I’ve probably missed some great candidates, and they can always serve as templates or raw materials for future posts.)
I’d also add the question of “When can we expect GAI?” Some people I’ve talked to about this issue don’t think it’s possible to get GAI this century.
I agree that should be on the list. It’s a hard question to answer without lots of time and technical detail, though, which is part of why I went with making the problem seem more vivid and immediate by indirect means. Short of internalizing Cognitive Biases Affecting Judgment of Global Risks or a lot of hard sci-fi, I’m not sure there’s any good way to short-circuit people’s intuitions that FAI doesn’t feel like an imminent risk.
‘We really don’t know, but it wouldn’t be a huge surprise if it happened this century, and it would be surprising if it doesn’t happen in the next 300 years’ is I think a solid mainstream position. For the purpose of the Core List it might be better handled as a quick 2-3 paragraph overview in a longer (say, 3-page) article answering ‘Why is FAI fiercely urgent?’; Luke’s When Will AI Be Created? is, I think, a good choice for the Further Reading section.
I’d caution about conflating FOOM with AGI risks, or even with hungry AGI. Hard takeoff is the most obviously risky situation from an AGI safety perspective because it would be impossible to react to an unfriendly AGI without possible world-ending consequences, and is very compelling in story-telling, but a soft takeoff situation is not particularly likely to be safe just because someone /could/ turn off its supplies if no one would be able to tell that such a concern is necessary. An AI that’s only several dozen times smarter than humans probably won’t crack the protein folding problem, but it’s probably smart enough to navigate corporate red tape or manipulate a team of researchers.
Comparative and Absolute Advantage in AI demonstrates the simple lies-to-children version rather easily. The strong case would be the Riemann Hypothesis Catastrophe, but I’ve not seen any particularly good writeups of that even as it seems to be local jargon.
Just Another Day In Utopia is a little overly gamified and surreal, but generally pleasant. While intended as subtle horror, Iceman’s Friendship Is Optimal is about the right mix of ‘almost right’ that it almost seems like the ponies are added just to explain why someone might resist the Singularity—it’s only on deeper analysis that the value-of-your-values metric falls apart. The full text is too long for this sort of work, but excerpts from the work could be informative enough
Mu? Or moo, depending on severity.
Lost Purposes is a little shallow and not terribly well-researched, but it’s a compelling example of subgoal stomp and value drift that’s likely to be particularly applicable in an educational environment. And, of course, Magical Categories, which you’ve already linked, is rather illustrative on its own.
The first few paragraphs of Interpersonal Entanglement are very illuminating.
Which things I said made you worry I was conflating these? If hard and soft takeoff were equally likely I’d still think we should put most of our energy into worrying about hard takeoff; but hard seems significantly likelier to me.
Do you think non-hungry AGIs are likely? And if we build one, do you think it’s likely to be dangerous? When I imagine a non-hungry AGI, I imagine a simple behavior-executer like the blue-minimizing robot. It optimizes for something self-involving, like a deontologist rather than a consequentialist. It puts all its resources into, say, increasing the probability that it will pick up a hammer and then place it back down, at this exact moment; but it doesn’t try to tile the solar system with copies of itself picking up copies of the hammer, because it doesn’t think the copies would have value. It’s just stuck in a loop executing an action, and an action too feeble to add up to anything interstellar.
Ever? For how long?
I suspect non-MLP fans will miss most of what makes such a future inspiring or fun. And inspiring and fun is what I’m shooting for.
This is one of the articles I was closest to adding to the list, because I feel more poetry/rhetoric is needed to seal the deal. But it’s a bit too long relative to how transparent its points of relevance aren’t.
But for ‘Why is it hard to give AGI known or predictable goals?’ what I had in mind is a popularization of the problems with evolutionary algorithms, or the relevance of Löb’s Theorem to self-modifying AI, or some combination of these and other concerns.
At least from my readings, points 11, 12, and 13 are the big focus points on AGI risks, and they’re defaulting to genie-level capabilities: the only earlier machine is purely instruction-set blue-minimizing robot.
Hard takeoff being significantly more likely means that your concerns are, naturally and reasonably, going to gravitate toward discussing AGI risks and hungry AGI in the context of FOOMing AGI. That makes sense for people who can jump the inferential difference into explosive recursive improvement. If you’re writing a work to help /others understand/ the concept of AGI risks, though, discussing how a FOOMing AGI could start taking apart Jupiter in order to make more smiley faces, due next Tuesday, requires that they accept a more complex scenario than that of a general machine intelligence to begin with. This makes sense from a risk analysis viewpoint, where Bayesian multiplication is vital for comparing relative risks—very important to the values of SIRI, targeting folk who know what a Singularity is. It’s unnecessary for the purpose of risk awareness, where showing the simplest threshold risk gets folk to pay attention—which is more important to the MIRI, targeting folk who want to know what machine intelligence could be (and are narrative thinkers, with the resulting logical biases).
If the possibility of strong AGI occurring is P1, the probability of strong AGI going FOOM is P2, and probability of any strong AGI being destructive is P3, the necessary understanding to grasp P1xP2xP3 is unavoidably going to be higher than P1xP3, even if P2 is very close to 1. You can always introduce P2 later, in order to show why the results would be much worse than everyone’s already expecting—and that has a stronger effect on avoidance-heavy human neurology than letting people think that machine intelligence can be made safe by just preventing the AGI from reaching high levels of self-improvement.
If there are serious existential risks to soft and takeoff and even no takeoff AGI, then discussing a general risk first not only appears more serious, but also makes later discussion of hard takeoff hit even harder.
Hungry AGIs occur when the utility of additional resources exceeds the costs of additional resources, as amortorized by whatever time discounting function you’re using. That’s very likely as the AGI calculates a sufficiently long-duration event, even with heavy time discounting, but that’s not the full set of possible minds. It’s quite easy to imagine a non-hungry AGI that causes civilization-level risks, or even a non-hungry non-FOOM AGI that causes continent-level risks. ((I don’t think it’s terribly likely, since barring exceptional information control or unlikely design constraints, it’d be bypassed by a copy turned intentionally-hungry AGI, but as above, this is a risk awareness matter rather than risk analysis one.))
More importantly, you don’t need to FOOM to have a hungry AGI. A ‘stupid’ tool AI, even a ‘stupid’ tool AI that gets only small benefits from additional resources, could still go hungry with the wrong question or the wrong discount on future time—or even if it merely made a bad time estimation on a normal question. It’s bad to have a few kilotons of computronium pave over the galaxy with smiley faces; it’s /embarrassing/ to have the solar system paved over with inefficient transistors trying to find a short answer to Fermat’s Last Theorem. Or if I’m wrong, and a machine intelligent slightly smarter than the chess team at MIT can crack the protein folding problem in a year, a blue-minimizing AGI becomes /very/ frightening even with a small total intelligence.
The strict version of the protein folding prediction problem was defined about half a century ago, and has been a fairly well-known and well-studied problem enough that I’m willing to wager we’ve had several-dozen intelligent people working on it for most of that time period (and, recently, several-dozen intelligent people working on just software implementations). An AGI built today has the advantage of their research, along with a different neurological design, but in turn it may have additional limitations. Predictions are hard, especially about the future, but for the purposes of a thought experiment it’s not obvious that another fifty years without an AGI would change the matter so dramatically. I suspect /That Alien Message/ discusses a boxed AI with the sum computational power of the entire planet across long periods of time precisely because I’m not the only one to give that estimate.
And, honestly, once you have an AGI in the field, fifty years is a very long event horizon for even the slow takeoff scenarios.
Not as much as you’d expect. It’s more calling on the sort of things that get folk interested in The Sims or in World of Warcraft, and iceman seemed to intentionally write it to be accessible to the general audience in preference to pony fans. The big benefit about ponies is that they’re strange enough that it’s someone /else’s/ wish fulfillment. ((Conversely, it doesn’t really benefit from knowledge of the show, since it doesn’t use the main cast or default setting: Celest-AI shares very little overlap with the character Princess Celestia, excepting that they can control a sun.)) The full work is probably not useful for this, but chapter six alone might be a useful contrast to / Just Another Day in Utopia/.
Hm… that would be a tricky requirement to fill: there are very few good layperson’s versions of Löb’s Problem as it is, and the question does not easily reduce from the mathematic analysis. (EDIT: Or rather, it goes from being formal logic Deep Magic to obvious truism in attempts to demonstrate it… still, space to improve on the matter after that cartoon.)
I don’t think anybody knows enough to answer this question with any certainty.
I agree, but there’s a certain standard story we tend to tell, not so much because we’re certain it’s the initial trajectory as because it helps make the risks more concrete and vivid. To cite the most recent instance of this meme:
What I was looking for is just this standard story, or something similarly plausible and specific, fleshed out and briefly defended in its own post, as a way of using narrative to change people’s go-to envisioned scenario to something fast and brutal.
It’s a bit on the long side, but Why an Intelligence Explosion is Probable might work for this.