Catnee comments on How Might an Alignment Attractor Look like?

Catnee 29 Apr 2022 3:53 UTC
5 points
I think problem is not that unaligned AGI doesn’t understand human values, it might understand them better than aligned one, it might understand all the consequences of its actions, problem is that it will not care about it. More so, detailed understanding of human values has an instrumental value, it is much easier to deceive and follow your goal when you have clear vision of “what will looks bad and might result in countermeasures”