Shmi comments on Distillation of “How Likely Is Deceptive Alignment?”

Shmi 19 Nov 2022 4:56 UTC
3 points
0
I also somewhat disagree with the core argument in that it proves too much about humans. Humans are approximately aligned, and we only need to match that level of alignment.
Hmm, humans do appear approximately aligned as long as they don’t have definitive advantages. “Power corrupts” and all that. If you take an average “aligned” human and give them unlimited power and no checks and balances, the usual trope happens in real life.
- jacob_cannell 19 Nov 2022 19:16 UTC
  2 points
  0
  Parent
  Yeah, the typical human is only partially aligned with the rest of humanity and only in a highly non uniform way, so you get the typical distribution of historical results when giving supreme power to a single human—with outcomes highly contingent on the specific human.
  
  So if AGI is only as aligned as typical humans, we’ll also probably need a heterogeneous AGI population and robust decentralized control structures to get a good multipolar outcome. But it also seems likely that any path leading to virtual brain-like AGI will also allow for selecting for altruism/alignment well outside the normal range.