Relatedly on introspection, can we devise some unlearning procedure that removes models’ capability to introspect? This might reduce their situational awwareness.
Relatedly on introspection, can we devise some unlearning procedure that removes models’ capability to introspect? This might reduce their situational awwareness.