A fairer comparison would probably be to actually try hard at building the kind of scaffold which could use ~10k$ in inference costs productively. I suspect the resulting agent would probably not do much better than with 100$ of inference, but it seems hard to be confident. And it seems harder still to be confident about what will happen even in just 3 years’ time, given that pretraining compute seems like it will probably grow about 10x/year and that there might be stronger pushes towards automated ML.
A related announcement, explicitly targeting ‘building an epistemically sound research agent @elicitorg that can use unlimited test-time compute while keeping reasoning transparent & verifiable’: https://x.com/stuhlmueller/status/1869080354658890009.
A related announcement, explicitly targeting ‘building an epistemically sound research agent
@elicitorg that can use unlimited test-time compute while keeping reasoning transparent & verifiable’: https://x.com/stuhlmueller/status/1869080354658890009.