Larger LMs seem to benefit differentially more from tools: ‘Absolute performance and improvement-per-turn (e.g., slope) scale with model size.’ https://xingyaoww.github.io/mint-bench/. This seems pretty good for safety, to the degree tool usage is often more transparent than model internals.
Larger LMs seem to benefit differentially more from tools: ‘Absolute performance and improvement-per-turn (e.g., slope) scale with model size.’ https://xingyaoww.github.io/mint-bench/. This seems pretty good for safety, to the degree tool usage is often more transparent than model internals.