Zach Stein-Perlman comments on Untrusted smart models and trusted dumb models

Zach Stein-Perlman 10 Dec 2024 6:30 UTC
LW: 6 AF: 5
4
AF
This early control post introduced super important ideas: trusted monitoring plus the general point
if you think about approaches to safely using AIs that are robust to deceptive alignment and which can be used right now, a core dynamic is trying to use your dumb trusted models together with your limited access to trusted humans to make it hard for your untrusted smart models to cause problems.