Sam Marks comments on A basic systems architecture for AI agents that do autonomous research

Sam Marks 1 Oct 2024 7:43 UTC
LW: 15 AF: 10
0
AF
While I agree the example in Sycophancy to Subterfuge isn’t realistic, I don’t follow how the architecture you describe here precludes it. I think a pretty realistic set-up for training an agent via RL would involve computing scalar rewards on the execution machine or some other machine that could be compromised from the execution machine (with the scalar rewards being sent back to the inference machine for backprop and parameter updates).