RogerDearnaley comments on Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

RogerDearnaley Jul 3, 2024, 5:31 PM
2 points
0
However most humans probably don’t have a deep understanding of human values, but I see it as a positive outcome if a random human was picked and given god level abilities.
Every autocracy in the world has done the experiment of giving a typical human massive amounts of power over other humans: it almost invariably turns out extremely badly for everyone else. For an aligned AI, we don’t just need something as well aligned and morally good as a typical human, we need something morally vary better, comparable to an saint or an angel. That means building something that has never previously existed.
Humans are evolved intelligences. While they can and will cooperate on non-sero-sum games, present them with a non-iterated zero-sum situation and they will (almost always) look out for themselves and their close relatives, just as evolution would predict. We’re building a non-evolved intelligence, so the orthogonality thesis applies, and what we want is something that will look out for us, not itself, in a zero-sum situation. Training (in some sense, distilling) a human-like intelligence off vast amounts of human-produced data isn’t going to do this by default.