I think the closest thing to an explanation of Eliezer’s arguments formulated in a way that could plausibly pass standard ML peer review is my paper The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
Thanks for posting, it’s well written and concise but I fear it suffers the same flaw that all such explanations share:
Weapons development: AGIs could design novel weapons which are more powerful than those under human control, gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans. An early example of AI weapons development capabilities comes from an AI used for drug development, which was repurposed to design chemical weapons [Urbina et al., 2022].
The most critical part, the “gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans.”, is simply never explained in detail. I get there are many info-hazards in this line of inquiry, but in this case it’s such a contrast to the well elaborated prior 2⁄3 of the paper that it really stands out how hand-waivy this part of the argument is.
The most critical part, the “gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans.”, is simply never explained in detail.
Generally, you can’t explain in detail the steps that something smarter than you will take because it’s smarter and will be able to think up better steps.
If we take an intelligence that’s much smarter than humans, it could make a lot of money on the stock market and buy shares of the companies that produce the weapons and then let those companies update the software of the factories with AI software that’s sold as increasing the efficiency of the factory to all the involved human decision makers.
Thinking that you can prevent smart AGI from accessing factories is like thinking you could box it a decade ago. The economic pressures make it so that boxing an AI reduces the economic opportunities a lot and thus companies like OpenAI don’t box their AI.
Given very smart AI power is a way to win economic competitions because the AI can outcompete competitors. Just like people aren’t boxing their AIs they are also not giving them distance from power.
If we take an intelligence that’s much smarter than humans, it could make a lot of money on the stock market and buy shares of the companies that produce the weapons and then let those companies update the software of the factories with AI software that’s sold as increasing the efficiency of the factory to all the involved human decision makers.
Even the first part of this scenario doesn’t make sense. It’s not possible to earn a lot of money on any major stock market anonymously because of KYC rules, and the very motivated groups that enforce those rules, which every country with a major stock market has.
It might be possible to use intermediary agents but really competent people, who will definitely get involved by the dozens and cross check each other if it’s a significant amount of money, can tell if someone is genuine or just a patsy.
Plus beyond a certain threshold, there’s only a few thousand people on this planet who actually are part of the final decision making process authorizing moving around that much money, and they all can visit each other in person to verify.
The only real way would be to subvert several dozen core decision makers in this group near simultaneously and have them vouch for each other and ‘check’ each other, assuming everything else goes smoothly.
But then the actually interesting part would be how this could be accomplished in the first place.
I think the closest thing to an explanation of Eliezer’s arguments formulated in a way that could plausibly pass standard ML peer review is my paper The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
Linking the post version which some people may find easier to read:
The Alignment Problem from a Deep Learning Perspective (major rewrite)
Thanks for posting, it’s well written and concise but I fear it suffers the same flaw that all such explanations share:
The most critical part, the “gain access to facilities for manufacturing these weapons (e.g. via hacking or persuasion techniques), and deploy them to threaten or attack humans.”, is simply never explained in detail. I get there are many info-hazards in this line of inquiry, but in this case it’s such a contrast to the well elaborated prior 2⁄3 of the paper that it really stands out how hand-waivy this part of the argument is.
I’m working on a follow-up exploring threat models specifically, stay tuned.
Generally, you can’t explain in detail the steps that something smarter than you will take because it’s smarter and will be able to think up better steps.
If we take an intelligence that’s much smarter than humans, it could make a lot of money on the stock market and buy shares of the companies that produce the weapons and then let those companies update the software of the factories with AI software that’s sold as increasing the efficiency of the factory to all the involved human decision makers.
Thinking that you can prevent smart AGI from accessing factories is like thinking you could box it a decade ago. The economic pressures make it so that boxing an AI reduces the economic opportunities a lot and thus companies like OpenAI don’t box their AI.
Given very smart AI power is a way to win economic competitions because the AI can outcompete competitors. Just like people aren’t boxing their AIs they are also not giving them distance from power.
Even the first part of this scenario doesn’t make sense. It’s not possible to earn a lot of money on any major stock market anonymously because of KYC rules, and the very motivated groups that enforce those rules, which every country with a major stock market has.
It might be possible to use intermediary agents but really competent people, who will definitely get involved by the dozens and cross check each other if it’s a significant amount of money, can tell if someone is genuine or just a patsy.
Plus beyond a certain threshold, there’s only a few thousand people on this planet who actually are part of the final decision making process authorizing moving around that much money, and they all can visit each other in person to verify.
The only real way would be to subvert several dozen core decision makers in this group near simultaneously and have them vouch for each other and ‘check’ each other, assuming everything else goes smoothly.
But then the actually interesting part would be how this could be accomplished in the first place.