Raghuvar Nadig comments on Why is o1 so deceptive?

Raghuvar Nadig 28 Sep 2024 15:13 UTC
1 point
0
Is there a term for a (hypothetical) phenomenon where AI systems might come to mirror the values and behavior of their creators, analogous to enculturation in humans? Claude suggests ‘Ethos Imprinting’ but I’m not sure if there’s something standard out there.