Buck comments on Toy models of AI control for concentrated catastrophe prevention

Buck 6 Feb 2024 16:28 UTC
LW: 6 AF: 4
0
AF
I agree re time-awareness, with two caveats:
- The kind of mechanism you listed probably only allows the AIs to have a rough idea of what time it is.
- We can keep some of the instances of our AI very unaware of the time, by restricting their between-episode memory. For example, we might do this for the instances responsible for untrusted monitoring, to reduce collusion.