the gears to ascension comments on Take 9: No, RLHF/IDA/debate doesn’t solve outer alignment.

the gears to ascension 12 Dec 2022 18:50 UTC
4 points
1
I’d propose that RLHF matches the level of “outer alignment” humans have, which isn’t close to good enough even for ourselves. we have a lot more old “inner alignment”, though, resulting from genome self-friendliness within our species.

(inner/outer alignment are blurry intuitive words and are likely to collapse to a better representation under attempted systematization. I forget what post argues that)