Related: The question “Where is a comprehensive, well-argued explanation of Eliezer’s arguments for AI risk, explaining all lingo, spelling out each step, referencing each claim, and open to open peer review” apparently has no answer? Cause a lot of people have been asking. They see the podcast, and are concerned, but unconvinced, and want to read a solid paper or book chapter, or the equivalent in website form. Something you can read over the course of a few hours, and actually have the questions answered. Something you can cite and criticise, and have Eliezer accept that this is a good version to attack. No handwaving, or vaguely referencing online texts without links, or concepts that are mentioned everywhere and never properly explained with hindsight, no pretending a step is trivial or obvious when it simply is not. But all the arguments in it. All the data in it.
I think the closest thing to an explanation of Eliezer’s arguments formulated in a way that could plausibly pass standard ML peer review is my paper The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
Thanks for posting, it’s well written and concise but I fear it suffers the same flaw that all such explanations share:
Weapons development: AGIs could design novel weapons which are more powerful than those under human control, gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans. An early example of AI weapons development capabilities comes from an AI used for drug development, which was repurposed to design chemical weapons [Urbina et al., 2022].
The most critical part, the “gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans.”, is simply never explained in detail. I get there are many info-hazards in this line of inquiry, but in this case it’s such a contrast to the well elaborated prior 2⁄3 of the paper that it really stands out how hand-waivy this part of the argument is.
The most critical part, the “gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans.”, is simply never explained in detail.
Generally, you can’t explain in detail the steps that something smarter than you will take because it’s smarter and will be able to think up better steps.
If we take an intelligence that’s much smarter than humans, it could make a lot of money on the stock market and buy shares of the companies that produce the weapons and then let those companies update the software of the factories with AI software that’s sold as increasing the efficiency of the factory to all the involved human decision makers.
Thinking that you can prevent smart AGI from accessing factories is like thinking you could box it a decade ago. The economic pressures make it so that boxing an AI reduces the economic opportunities a lot and thus companies like OpenAI don’t box their AI.
Given very smart AI power is a way to win economic competitions because the AI can outcompete competitors. Just like people aren’t boxing their AIs they are also not giving them distance from power.
If we take an intelligence that’s much smarter than humans, it could make a lot of money on the stock market and buy shares of the companies that produce the weapons and then let those companies update the software of the factories with AI software that’s sold as increasing the efficiency of the factory to all the involved human decision makers.
Even the first part of this scenario doesn’t make sense. It’s not possible to earn a lot of money on any major stock market anonymously because of KYC rules, and the very motivated groups that enforce those rules, which every country with a major stock market has.
It might be possible to use intermediary agents but really competent people, who will definitely get involved by the dozens and cross check each other if it’s a significant amount of money, can tell if someone is genuine or just a patsy.
Plus beyond a certain threshold, there’s only a few thousand people on this planet who actually are part of the final decision making process authorizing moving around that much money, and they all can visit each other in person to verify.
The only real way would be to subvert several dozen core decision makers in this group near simultaneously and have them vouch for each other and ‘check’ each other, assuming everything else goes smoothly.
But then the actually interesting part would be how this could be accomplished in the first place.
People have been trying to write this for years, but it’s genuinely hard. Eliezer wrote a lot of it on Arbital, but it is too technical for this purpose. Richard Ngo has been writing distillations for a while, and I think they are pretty decent, but IMO currently fail to really actually get the intuitions across and connect things on a more intuitive and emotional level. Many people have written books, but all of them had a spin on them that didn’t work for a lot of people.
There are really a ton of things you can send people if they ask for something like this. Tons of people have tried to make this. I don’t think we have anything perfect, but I really don’t think it’s for lack of trying.
I don’t have all the cognitive context booted up of what exact essays are part of AI Safety Fundamentals, so do please forgive me if something here does end up being covered and I just forgot about an important essay, but as a quick list of things that I vaguely remember missing:
Having good intuitions for how smart a superintelligence could really be. Arguments for the lack of upper limit of intelligence.
Having good intuitions for complexity of value. That even if you get an AI aligned with your urges and local desires, this doesn’t clearly get you that far towards an AGI you would feel comfortable optimizing things on their own.
Somehow communicating the counterintuiveness of optimization. Classical examples that have helped me are the cannibal bug examples from the sequences. The genetic algorithm that developed an antenna (the specification gaming Deepmind post never really got this across for me)
Security mindset stuff
Something about the set of central intuitions I took away from Paul’s work. I.e. something in the space of “try to punt as much of the problem to systems smarter than you”.
Eternity in six hours style stuff. Trying to understand the scale of the future. This has been very influential on my models of what kinds of goals an AI might have.
Civilizational inadequacy stuff. A huge component of people’s differing views on what to do about AI Risk seems to be sources in disagreements on the degree to which humanity at large does crazy things when presented with challenges. I think that’s currently completely not covered in AGISF.
There are probably more things, and some things on this list are probably wrong since I only skimmed the curriculum again, but hopefully it gives a taste.
I totally agreed that question should have an answer.
On a tangent: During my talks with numerous people, I have noticed that even agreeing on fundamentals like “what is AGI” and “current systems are not AGI” is furiously hard.
The best primer that I have found so far is Basics of AI Wiping Out All Value in the Universe by Zvi. It’s certainly not going to pass peer review, but it’s very accessible, compact, covers the breadth of the topics, and links to several other useful references. It has the downside of being buried in a very long article, though the link above should take you to the correct section.
What does that mean? I notice that it doesn’t actually prove that AI will definitely kill us all. I’ve never seen anything else that does, either. You can’t distill what never existed.
I feel like Robert Miles’ series of YouTube videos is the most accessible yet on-point explanation of this that is to be found right now. They’re good, accessible, clear, easy to get. That said, they’re videos, which for some people might be a barrier (I myself prefer reading my heady stuff).
Honestly, would it be such a challenge to put together something? We could work on it, then put it up on a dedicated domain as a standalone web page. We could even include different levels of explanation (e.g. “basic” to “advanced” depending on how deep you want to delve into the issues). Maybe gathering the references is the most challenging thing, but I’m sure someone must have them already piled up in a folder or Mendeley group somewhere.
Related: The question “Where is a comprehensive, well-argued explanation of Eliezer’s arguments for AI risk, explaining all lingo, spelling out each step, referencing each claim, and open to open peer review” apparently has no answer?
There is no such logically consistent argument, even scattered across dozens of hyperlinks. At least none I’ve seen.
Let’s not bury this comment. Here is someone we have failed: there are comprehensive, well-argued explanations for all of this, and this person couldn’t find them. Even the responses to the parent comment don’t conclusively answer this—let’s make sure that everyone can find excellent arguments with little effort.
I think he was referring to the enormous corpus of writing of Eliezer and others on LessWrong, which together do, as far as I can tell, fulfill all of your requirements, though there is a lot of sifting to do. My guess is you don’t think this applies, but laserfiche thought the problem was likely one of ignorance about the existing writing, not your confident belief in its absence.
Why would a user who’s only made 8 comments assume my ignorance about the most read LW writer, when I clearly have engaged with several hundred posts, that anyone can see within 10 seconds of clicking my profile?
If they’re genuinely confused it’s bizarre that they didn’t bother checking, so much so that I didn’t even consider it a possibility.
Related: The question “Where is a comprehensive, well-argued explanation of Eliezer’s arguments for AI risk, explaining all lingo, spelling out each step, referencing each claim, and open to open peer review” apparently has no answer? Cause a lot of people have been asking. They see the podcast, and are concerned, but unconvinced, and want to read a solid paper or book chapter, or the equivalent in website form. Something you can read over the course of a few hours, and actually have the questions answered. Something you can cite and criticise, and have Eliezer accept that this is a good version to attack. No handwaving, or vaguely referencing online texts without links, or concepts that are mentioned everywhere and never properly explained with hindsight, no pretending a step is trivial or obvious when it simply is not. But all the arguments in it. All the data in it.
I think the closest thing to an explanation of Eliezer’s arguments formulated in a way that could plausibly pass standard ML peer review is my paper The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
Linking the post version which some people may find easier to read:
The Alignment Problem from a Deep Learning Perspective (major rewrite)
Thanks for posting, it’s well written and concise but I fear it suffers the same flaw that all such explanations share:
The most critical part, the “gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans.”, is simply never explained in detail. I get there are many info-hazards in this line of inquiry, but in this case it’s such a contrast to the well elaborated prior 2⁄3 of the paper that it really stands out how hand-waivy this part of the argument is.
I’m working on a follow-up exploring threat models specifically, stay tuned.
Generally, you can’t explain in detail the steps that something smarter than you will take because it’s smarter and will be able to think up better steps.
If we take an intelligence that’s much smarter than humans, it could make a lot of money on the stock market and buy shares of the companies that produce the weapons and then let those companies update the software of the factories with AI software that’s sold as increasing the efficiency of the factory to all the involved human decision makers.
Thinking that you can prevent smart AGI from accessing factories is like thinking you could box it a decade ago. The economic pressures make it so that boxing an AI reduces the economic opportunities a lot and thus companies like OpenAI don’t box their AI.
Given very smart AI power is a way to win economic competitions because the AI can outcompete competitors. Just like people aren’t boxing their AIs they are also not giving them distance from power.
Even the first part of this scenario doesn’t make sense. It’s not possible to earn a lot of money on any major stock market anonymously because of KYC rules, and the very motivated groups that enforce those rules, which every country with a major stock market has.
It might be possible to use intermediary agents but really competent people, who will definitely get involved by the dozens and cross check each other if it’s a significant amount of money, can tell if someone is genuine or just a patsy.
Plus beyond a certain threshold, there’s only a few thousand people on this planet who actually are part of the final decision making process authorizing moving around that much money, and they all can visit each other in person to verify.
The only real way would be to subvert several dozen core decision makers in this group near simultaneously and have them vouch for each other and ‘check’ each other, assuming everything else goes smoothly.
But then the actually interesting part would be how this could be accomplished in the first place.
None of these are what you describe, but here are some places people can be pointed to:
Rob Mile’s channel
The Stampy FAQ (they are open for help/input)
This list of introductions to AI safety
People have been trying to write this for years, but it’s genuinely hard. Eliezer wrote a lot of it on Arbital, but it is too technical for this purpose. Richard Ngo has been writing distillations for a while, and I think they are pretty decent, but IMO currently fail to really actually get the intuitions across and connect things on a more intuitive and emotional level. Many people have written books, but all of them had a spin on them that didn’t work for a lot of people.
There are really a ton of things you can send people if they ask for something like this. Tons of people have tried to make this. I don’t think we have anything perfect, but I really don’t think it’s for lack of trying.
Curious which intuitions you think most fail to come across?
I don’t have all the cognitive context booted up of what exact essays are part of AI Safety Fundamentals, so do please forgive me if something here does end up being covered and I just forgot about an important essay, but as a quick list of things that I vaguely remember missing:
Having good intuitions for how smart a superintelligence could really be. Arguments for the lack of upper limit of intelligence.
Having good intuitions for complexity of value. That even if you get an AI aligned with your urges and local desires, this doesn’t clearly get you that far towards an AGI you would feel comfortable optimizing things on their own.
Somehow communicating the counterintuiveness of optimization. Classical examples that have helped me are the cannibal bug examples from the sequences. The genetic algorithm that developed an antenna (the specification gaming Deepmind post never really got this across for me)
Security mindset stuff
Something about the set of central intuitions I took away from Paul’s work. I.e. something in the space of “try to punt as much of the problem to systems smarter than you”.
Eternity in six hours style stuff. Trying to understand the scale of the future. This has been very influential on my models of what kinds of goals an AI might have.
Civilizational inadequacy stuff. A huge component of people’s differing views on what to do about AI Risk seems to be sources in disagreements on the degree to which humanity at large does crazy things when presented with challenges. I think that’s currently completely not covered in AGISF.
There are probably more things, and some things on this list are probably wrong since I only skimmed the curriculum again, but hopefully it gives a taste.
I totally agreed that question should have an answer.
On a tangent: During my talks with numerous people, I have noticed that even agreeing on fundamentals like “what is AGI” and “current systems are not AGI” is furiously hard.
The best primer that I have found so far is Basics of AI Wiping Out All Value in the Universe by Zvi. It’s certainly not going to pass peer review, but it’s very accessible, compact, covers the breadth of the topics, and links to several other useful references. It has the downside of being buried in a very long article, though the link above should take you to the correct section.
What does that mean? I notice that it doesn’t actually prove that AI will definitely kill us all. I’ve never seen anything else that does, either. You can’t distill what never existed.
I feel like Robert Miles’ series of YouTube videos is the most accessible yet on-point explanation of this that is to be found right now. They’re good, accessible, clear, easy to get. That said, they’re videos, which for some people might be a barrier (I myself prefer reading my heady stuff).
Honestly, would it be such a challenge to put together something? We could work on it, then put it up on a dedicated domain as a standalone web page. We could even include different levels of explanation (e.g. “basic” to “advanced” depending on how deep you want to delve into the issues). Maybe gathering the references is the most challenging thing, but I’m sure someone must have them already piled up in a folder or Mendeley group somewhere.
There is no such logically consistent argument, even scattered across dozens of hyperlinks. At least none I’ve seen.
Let’s not bury this comment. Here is someone we have failed: there are comprehensive, well-argued explanations for all of this, and this person couldn’t find them. Even the responses to the parent comment don’t conclusively answer this—let’s make sure that everyone can find excellent arguments with little effort.
Is this written for a different comment and accidentally posted here?
I think he was referring to the enormous corpus of writing of Eliezer and others on LessWrong, which together do, as far as I can tell, fulfill all of your requirements, though there is a lot of sifting to do. My guess is you don’t think this applies, but laserfiche thought the problem was likely one of ignorance about the existing writing, not your confident belief in its absence.
Why would a user who’s only made 8 comments assume my ignorance about the most read LW writer, when I clearly have engaged with several hundred posts, that anyone can see within 10 seconds of clicking my profile?
If they’re genuinely confused it’s bizarre that they didn’t bother checking, so much so that I didn’t even consider it a possibility.
I’m curious if there are specific parts to the usual arguments that you find logically inconsistent.
Yup. I commented on how outreach pieces are generally too short on their own and should always be leading to something else here.