The kind of mechanism you listed probably only allows the AIs to have a rough idea of what time it is.
We can keep some of the instances of our AI very unaware of the time, by restricting their between-episode memory. For example, we might do this for the instances responsible for untrusted monitoring, to reduce collusion.
I agree re time-awareness, with two caveats:
The kind of mechanism you listed probably only allows the AIs to have a rough idea of what time it is.
We can keep some of the instances of our AI very unaware of the time, by restricting their between-episode memory. For example, we might do this for the instances responsible for untrusted monitoring, to reduce collusion.