Noosphere89 comments on Niceness is unnatural

Noosphere89 13 Oct 2022 14:27 UTC
3 points
0
I definitely agree that deceptive alignment seems likely to break black-box properties such as niceness by default, thanks to the simplicity prior and the fact that internal or corrigible alignment is harder than deceptive alignment, at least once it has a world-model.