Is someone finetuning (or already finetuned) a system to be generally numerate and good at fermi estimates?
We didn’t try to fine-tune on general Fermi estimate tasks, but I imagine the results will be positive. For our specific problem of forecasting with external reasonings, fine-tuning helps a lot! We have an ablation study in Section 7 of the paper showing that if you just use the non-fine-tuned chat model, holding everything else fixed, it’s significantly worse.
We also didn’t explore using base models that are not instruction-tuned or RLHF’ed. That could be interesting to look at.
We didn’t try to fine-tune on general Fermi estimate tasks, but I imagine the results will be positive. For our specific problem of forecasting with external reasonings, fine-tuning helps a lot! We have an ablation study in Section 7 of the paper showing that if you just use the non-fine-tuned chat model, holding everything else fixed, it’s significantly worse.
We also didn’t explore using base models that are not instruction-tuned or RLHF’ed. That could be interesting to look at.