Charlie Steiner comments on How likely is deceptive alignment?

Charlie Steiner 31 Aug 2022 21:09 UTC
LW: 2 AF: 1
0
AF

I still think there’s still approximately only one of those, though, since you have to get the objective to exactly match onto what you want.

Once you’re trying to extrapolate me rather than just copy me as-is, there are multiple ways to do the extrapolation. But I’d agree it’s still way less entropy than deceptive alignment.