Lightning Post: Things people in AI Safety should stop talking about
This is experimenting with a new kind of post which is meant to convey a lot of ideas very quickly, without going into much detail for each.
Things I wish people in AI Safety would stop talking about
A list of topics people concerned about x-risk from AI spend, in my opinion, way too much time talking about to those outside the community. It’s not that these things aren’t real, they just likely won’t actually end up mattering that much.
How an AI could persuade you to let it out of the box
WRONG!
Keeping AIs in boxes was never a thing companies were seriously going to do. An AI in a box isn’t useful. These aren’t academics carefully studying a new species. This is an industry, with everyone trying to get ahead, get consumer feedback, and training on parallel cloud computing.
How an AI could become an agent
WRONG!
Agency is the obvious next step that people will try to make their AIs into. An AI “tool” is simply inferior in almost every possible way to an agent. You don’t need to specify a special prompt, constantly click “Approve Plan”, or any of the supervised, time-consuming requirements from mere tools. The economic advantage is simply too staggering for people not to do this. Everyone can even agree that it’s dangerous, a totally bad idea, and still have to do it anyway if they want to (economically) survive.
How an AI could get ahold of, or create, weapons
WRONG!
The military advantages of fully-autonomous weapons is just too great for any large-scale government not to do, especially for democracies where losing troops abroad results in massive political backlash. Humans using a remote controller is just too slow a process, because it still means humans have to make very fast, split-second decisions. Fully autonomous warfare would result in tactical decisions occurring faster than any possible human could perform. Look at how Alpha Zero played millions of games against itself in 70 hours. AIs can make decisions faster, which is all that will matter.
How an AI might Recursively Self Improve without humans noticing
WRONG!
RSI is an ace in the hole for any company or government. They will try to do this. As AI continues to expand faster, and the stakes get higher, the paranoia that someone else will achieve it first will drive players to compete to create RSI. It’s the gift that keeps on giving. You don’t just get momentarily ahead of your competition, but you get to stay ahead, and keep moving so fast no one else can hope to keep up. Everyone will want to do this, even if they know it’s dangerous, because the potential gains are too great.
Why a specific AI will want to kill you
Even though most AI systems, scaled to superintelligence, might want you dead doesn’t mean this is a hill to fight hard on. At the end of the day, even if most don’t want you dead, it doesn’t matter. All you need is just one superintelligence to want you dead, and then you get dead. If someone’s idea of a “safe” superintelligence doesn’t specify how it deals with all potential future intelligences, then inevitably someone designs an AI that kills everyone. It’s an end state to the game. Unless an AI kills everyone or somehow prevents other AIs from developing, the game continues.
I’ll put a commensurate amount of effort into why you should talk about these things.
How an AI could persuade you to talk it out of a box/How an AI could become an agent
You should keep talking about this because if it is possible to “box” an AI, or keep it relegated to “tool” status, then it might be possible to use such an AI to combat unboxed, rogue AI’s. For example, give it a snapshot of the internet from a day ago, and ask it to find the physical location of rogue AI servers, which you promptly bomb.
How an AI could get ahold of, or create, weapons
You should keep talking about this because if an AI needs military access to dominate the world, then the number of potentially dangerous AI goes from the hundreds of thousands or millions to a few dozen, run by large countries that could theoretically be kept in line with international treaties.
How an AI might Recursively Self Improve without humans noticing
You should keep talking about this because it changes how many AI’s you’d have to monitor as active threats.
Why a specific AI will want to kill you
You should keep talking about this because the percentage of AI that are dangerous makes a huge difference to the playing field we have to consider. If 99.9% of AGI are safe, you can use those AGI to prevent a dangerous AI from coming into existence, or kill it when it pops up. If 99.9% of AGI are dangerous, there might be warning shots that can be used to pre-emptively ban AGI research in general.
In general, you should also talk about these things because you are trying to persuade people that don’t agree with you, and just shouting “WRONG” along with some 101 level arguments is not particularly convincing.
“keep it relegated to “tool” status, then it might be possible to use such an AI to combat unboxed, rogue AI”
I don’t think this is a realistic scenario. You seem to be seeing it as an island of rogue, agentic, “unboxed” AIs in a sea of tool AIs. I think it’s much, much more realistic that it’ll be the opposite. Most AIs will be unboxed agents because they are superior.
“For example, give it a snapshot of the internet from a day ago, and ask it to find the physical location of rogue AI servers, which you promptly bomb.”
This seems to be approaching it from a perspective where people in AIS have taken global control, or where normal people somehow start thinking the way they do. This is not realistic. This is not the world we live in. This is not how the people in control think.
“You should keep talking about this because if an AI needs military access to dominate the world, then the number of potentially dangerous AI goes from the hundreds of thousands or millions to a few dozen, run by large countries that could theoretically be kept in line with international treaties.”
This is a topic that I debated putting on the list, but resolved not to, but I don’t think humans have any real control at that point, regardless of treaties. I don’t even expect a rogue AI to have to forcefully coup’d humans. I expect us to coup’d ourselves. We might have figureheads occupying official positions, such as “President”/”CEO”/etc. but I don’t think humans will have much control over their own destiny by that point. Large-scale coordination I don’t think will be possible by then. I did remove it, because it seems more uncertain than the others listed.
“You should keep talking about this because it changes how many AI’s you’d have to monitor as active threats.”
Who is doing this monitoring? What is their power to act on such threats? Despite recent interest in AI Risk from “serious people”, I don’t think it’s at all realistic that we’ll see anything like this.
“If 99.9% of AGI are dangerous, there might be warning shots that can be used to pre-emptively ban AGI research in general.”
Probability distributions of how many AIs are dangerous is probably useful. I don’t think specific AIs being dangerous/non-dangerous will be, because I expect widespread proliferation. In terms of political ways out of the problem, I agree that some kind of crisis or “warning shot” is the most realistic situation where that might happen. But there have to actually be warning shots. Explaining thought experiments probably won’t matter. And, if that happens, I don’t think it would be a good idea to debate which specific AIs might kill you, and instead just call for a sweeping ban on all AI.
This argument falls apart on the last one. A superintelligence that wants to kill you can’t if it’s vastly out resourced and out numbered by superintelligences, some in boxes, that don’t want to kill humans.
You snuck in a questionable assumption, that a “free” superintelligence able to decide to kill you will be far more capable than boxed and restricted superintelligences who may have access to use weapons when authorized.
If the boxed superintelligence with the ability to plan usage of weapons when authorized by humans, and other boxed superintelligences able to control robotics in manufacturing cells are on humans side, the advantage for humans could be overwhelming. No matter how smart an ASI is it’s tough to win if the humans are prepared with millions of drones and nukes and space suits and bio and nano weapon sensors and weapons satellites and..
It’s an assumption EY has made many times, I am just calling it out.
“If the boxed superintelligence with the ability to plan usage of weapons when authorized by humans, and other boxed superintelligences able to control robotics in manufacturing cells are on humans side, the advantage for humans could be overwhelming”
As I said, I do not expect boxed AIs to be a thing most will do. We haven’t seen it, and I don’t expect to see it, because unboxed AIs are superior. This isn’t how people in control are approaching the situation, and I don’t expect that to change.
My definition of “box” may be very different from yours. In my definition, locked weights and training only on testing, as well as other design elements such as distribution detection, heavily box the model’a capabilities and behavior.
See https://www.lesswrong.com/posts/a5NxvzFGddj2e8uXQ/updating-drexler-s-cais-model?commentId=AZA8ujssBJK9vQXAY
It is fine if the model can access the internet, robotics, etc so long as it lacks the context information to know it’s on the real thing vs a sim or cached copy.
I feel like LW at least has already largely moved away from most of these ideas in the light of what’s been happening lately, especially since ChatGPT.