HoldenKarnofsky comments on Discussion with Nate Soares on a key alignment difficulty

HoldenKarnofsky 21 Mar 2023 5:24 UTC
LW: 10 AF: 4
5
AF
I hear you on this concern, but it basically seems similar (IMO) to a concern like: “The future of humanity after N more generations will be ~without value, due to all the reflection humans will do—and all the ways their values will change—between now and then.” A large set of “ems” gaining control of the future after a lot of “reflection” seems like quite comparable to future humans having control over the future (also after a lot of effective “reflection”).
I think there’s some validity to worrying about a future with very different values from today’s. But I think misaligned AI is (reasonably) usually assumed to diverge in more drastic and/or “bad” ways than humans themselves would if they stayed in control; I think of this difference as the major driver of wanting to align AIs at all. And it seems Nate thinks that the hypothetical training process I outline above gets us something much closer to “misaligned AI” levels of value divergence than to “ems” levels of value divergence.