I am not sure I can write out the full AI x-risk scenario.
1. AI quickly becomes super clever
2. Alignment is hard, like getting your great x10 grandchildren to think you’re a good person
3. The AI probably embarks on a big project which ignores us and accidentally kills us
Where am I wrong? Happy to be sent stuff to read.
I replied:
“1. AI quickly becomes super clever”
My AI risk model (which is not the same as everyone’s) more specifically says:
1a. We’ll eventually figure out how to make AI that’s ‘generally good at science’—like how humans can do sciences that didn’t exist when our brains evolved.
1b. AGI / STEM AI will have a large, fast, and discontinuous impact. Discontinuous because it’s a new sort of intelligence (not just AlphaGo 2 or GPT-5); large and fast because STEM is powerful, plus humans suck at STEM and aren’t cheap software that scales as you add hardware.
(Warning: argument is compressed for Twitter character count. There are other factors too, like recursive self-improvement.)
“2. Alignment is hard, like getting your great x10 grandchildren to think you’re a good person”
I’d say it’s hard like building a large/complex, novel software system that exhibits some strong robustness/security properties, on the first try, in the face of adverse optimization.
Goal stability over time is part of the problem, but not the core problem. The core problem (for ML) is ‘ML models are extremely opaque, and there’s no way to robustly get any complex real-world goal into a sufficiently opaque system’. The goal isn’t instilled in the first place.
“3. The AI probably embarks on a big project which ignores us and accidentally kills us”
Rather: which deliberately kills us because (1) we’re made of atoms that can be used for the project, and (2) we’re a threat. (E.g., we could build a rival superintelligence.)
From Twitter:
I replied: