NickGabs comments on Miscellaneous First-Pass Alignment Thoughts

NickGabs 22 Nov 2022 2:00 UTC
1 point
0
Human bureaucracies are mostly misaligned because the actual bureaucratic actors are also misaligned. I think a “bureaucracy” of perfectly aligned humans (like EA but better) would be well aligned. RLHF is obviously not a solution in the limit but I don’t think it’s extremely implausible that it is outer aligned enough to work, though I am much more enthusiastic about IDA