Consider the humble rock (or: why the dumb thing kills you)

When people think about street-fights and what they should do when they find themselves in the unfortunate position of being in one, they tend to stumble across a pretty concerning thought relatively early on: “What if my attacker has a knife?” . Then they will put loads of cognitive effort into strategies for how to deal with attackers wielding blades. On first glance this makes sense. Knives aren’t that uncommon and they are very scary, so it feels pretty dignified to have prepared for such scenarios (I apologize if this anecdote is horribly unrelatable to Statesians). The issue is that –all in all– knife related injuries from brawls or random attacks aren’t that common in most settings. Weapons of opportunity (a rock, a brick, a bottle, some piece of metal, anything you can pick up in the moment) are much more common. They are less scary, but everyone has access to them and I’ve met few people without experience who come up with plans for defending against those before they start thinking about knives. It’s not the really scary thing that kills you. It’s the minimum viable thing.

When deliberating poisons, people tend to think of the flashy, potent ones. Cyanide, Strychnine, Tetrodotoxin. Anything sufficiently scary with LDs in the low milligrams. The ones that are difficult to defend against and known first and foremost for their toxicity. On first pass this seems reasonable, but the fact that they are scary and hard to defend against means that it is very rare to encounter them. It is staggeringly more likely that you will suffer poisoning from Acetaminophen or the likes. OTC medications, cleaning products, batteries, pesticides, supplements. Poisons which are weak enough to be common. It’s not the really scary thing that kills you. It’s the minimum viable thing.

My impression is that people in AI safety circles follow a similar pattern of directing most of their attention at the very competent, very scary parts of risk-space, rather than the large parts. Unless I am missing something, it feels pretty clear that the majority of doom-worlds are ones in which we die stupidly. Not by the deft hands of some superintelligent optimizer tiling the universe with its will, but the clumsy ones of a process that is powerful enough to kill a significant chunk of humanity but not smart enough to do anything impressive after that point. Not a schemer but an unstable idiot placed a little too close to a very spooky button by other unstable idiots.

Killing enough of humanity that the rest will die soon after isn’t that hard. We are very very fragile. Of course the sorts of scenarios which kill everyone immediately are less likely in worlds where there isn’t competent, directed effort, but the post-apocalypse is a dangerous place and the odds that the people equipped to rebuild civilisation will be among the survivors, find themselves around the means to do so, make a few more lucky rolls on location and keep that spark going down a number of generations are low. Nowhere near zero but low. In bits of branch-space in which it is technically possible to bounce back given some factors, lots of timelines get shredded. You don’t need a lot of general intelligence to design a bio-weapon or cause the leak of one. With militaries increasingly happy to hand weapons to black-boxes, you don’t need to be very clever to start a nuclear incident. The meme which makes humanity destroy itself too might be relatively simple. In most worlds, before you get competent maximizers with the kind of goal content integrity, embedded agency and all the rest to kill humanity deliberately, keep the lights on afterwards and have a plan for what to do next, you get a truly baffling number of flailing idiots next to powerful buttons, or things with some but not all of the relevant capabilities in place – competent within the current paradigm but with a world-model that breaks down in the anomalous environments it creates. Consider the humble rock.

Another way of motivating this intuition is great-filter flavoured. Not only do we not see particularly many alien civs whizzing around, we also don’t see particularly many of the star-eating Super-Ints that might have killed them. AI as a great filter makes more sense if most of the failure modes are stupid – if the demon kills itself along with those who summoned it.

This is merely an argument for a recalibration of beliefs, not necessarily an argument that you should change something about your policies. In fact there are some highly compelling arguments for why the assumption that we’re likely to die stupidly shouldn’t actually matter for the way you proceed in some relevant ways.

One of them is that the calculus doesn’t work. That 1100 odds of an unaligned maximizer are significantly worse than 110 odds of a stupid apocalypse because the stupid apocalypse only kills humanity. The competent maximizer kills the universe. This is an entirely fair point, but I’d like you to make sure that this is actually the calculus you’re running rather than a mere rationalization of pre-existing beliefs.

The second is that the calculus is irrelevant because most people in AI-safety positions have much more sway on levers that lead to competent maximizers than they do on levers which lead to idiots trusting idiots with doomsday-tech. There is a Garrabrantian notion that most of your caring should be tangled up with outcomes that are significantly causally downstream from you, so while one of those risks is greater, you have a comparative advantage on minimizing the smaller one, which outweighs the difference. This too might very well be true and I’d merely ask you to check if it’s the real source of your beliefs or whether you are unduly worried about the scarier thing because it is scary. Due to a narrativistic thinking where the story doesn’t end in bathos. Where the threat is powerful. Where you don’t just get hit over the head with a rock.

It might in this specific case be dignified to put all your effort into preparing for knife fights, but I think your calibration is off if you think that those aren’t a small subset of worlds in which we die. It’s not the really scary thing that kills you. It’s the minimum viable thing.