While I actually agree that tool AI goals can be programmed, if you want to keep the whole thing from turning unsafely agenty, you’re going to have to strictly separate the inductive reasoning from the actual tool run: run induction for a while, then use tool-mode to compose plans over the induced models of the world, potentially after censoring those models for safety.
While I actually agree that tool AI goals can be programmed, if you want to keep the whole thing from turning unsafely agenty, you’re going to have to strictly separate the inductive reasoning from the actual tool run: run induction for a while, then use tool-mode to compose plans over the induced models of the world, potentially after censoring those models for safety.