Why isn’t it competitive? A is being trained the same way as an agentic system, so it will be competitive.
Adding B is a 2x runtime/training-cost overhead, so there is a “constant factor” cost; is that enough to say something is “not competitive”? In practice I’d expect you could strike a good safety/overhead balance for much less.
Hmm well if A is being trained the same way using deep learning toward being an agentic system, then it is subject to mesa-optimization and having goals, isn’t it? And being subject to mesa-optimization, do you have a way to address inner misalignment failures like deceptive alignment? Oversight alone can be thwarted by a deceptively-aligned mesa-optimizer.
You might possibly address this if you give the overseer good enough transparency tools. But such tools don’t exist yet.
Why isn’t it competitive? A is being trained the same way as an agentic system, so it will be competitive.
Adding B is a 2x runtime/training-cost overhead, so there is a “constant factor” cost; is that enough to say something is “not competitive”? In practice I’d expect you could strike a good safety/overhead balance for much less.
Hmm well if A is being trained the same way using deep learning toward being an agentic system, then it is subject to mesa-optimization and having goals, isn’t it? And being subject to mesa-optimization, do you have a way to address inner misalignment failures like deceptive alignment? Oversight alone can be thwarted by a deceptively-aligned mesa-optimizer.
You might possibly address this if you give the overseer good enough transparency tools. But such tools don’t exist yet.