Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 2 Jan 2025 0:34 UTC
1 point
0
Relatedly on introspection, can we devise some unlearning procedure that removes models’ capability to introspect? This might reduce their situational awwareness.