Fascinating post. It reminds me of how the human brain can ‘fake alignment’ through self-deception—rationalizing actions to seem aligned with values while masking deeper misalignment. Could insights into LLM alignment help us understand and mitigate this kind of ‘auto-corruption’ in ourselves?
Curious if you’ve thought about parallels like this.
Fascinating post. It reminds me of how the human brain can ‘fake alignment’ through self-deception—rationalizing actions to seem aligned with values while masking deeper misalignment. Could insights into LLM alignment help us understand and mitigate this kind of ‘auto-corruption’ in ourselves?
Curious if you’ve thought about parallels like this.