This is a distribution of math problems GPT-3 wasn’t finetuned on. Yet it’s able to few-shot generalize and perform well. This is an amazing level of robustness relative to 2018 deep learning systems. I don’t see why scaling and access to external tools (e.g. to perform long calculations) wouldn’t produce the kind of robustness you have in mind.
I think you’re moving the goal-posts, since before you mentioned “without external calculators”. I think external tools are likely to be critical to doing this, and I’m much more optimistic about that path to doing this kind of robust generalization. I don’t think that necessarily addresses concerns about how the system reasons internally, though, which still seems likely to be critical for alignment.
This is a distribution of math problems GPT-3 wasn’t finetuned on. Yet it’s able to few-shot generalize and perform well. This is an amazing level of robustness relative to 2018 deep learning systems. I don’t see why scaling and access to external tools (e.g. to perform long calculations) wouldn’t produce the kind of robustness you have in mind.
I think you’re moving the goal-posts, since before you mentioned “without external calculators”. I think external tools are likely to be critical to doing this, and I’m much more optimistic about that path to doing this kind of robust generalization. I don’t think that necessarily addresses concerns about how the system reasons internally, though, which still seems likely to be critical for alignment.