If I thought large language models were already capable of doing simple plug-and-chug problems, I’m not sure why I’d update much on this development. There were some slightly hard problems that the model was capable of doing, that Google highlighted in their paper (though they were cherry-picked)—and for that I did update by a bit (I said my timelines advanced by “a few years”).
>If I thought large language models were already capable of doing simple plug-and-chug problems, I’m not sure why I’d update much on this development.
I suppose I just have different intuitions on this. Let’s just make a second bet. I imagine you can find another element for your list you will be comfortable adding—it doesn’t necessarily have to be a dataset, just something in the same spirit as the other items in the list.
I think I’ll pass up an opportunity for a second bet for now. My mistake was being too careless in the first place—and I’m not currently too interested in doing a deeper dive into what might be a good replacement for MATH.
If I thought large language models were already capable of doing simple plug-and-chug problems, I’m not sure why I’d update much on this development. There were some slightly hard problems that the model was capable of doing, that Google highlighted in their paper (though they were cherry-picked)—and for that I did update by a bit (I said my timelines advanced by “a few years”).
>If I thought large language models were already capable of doing simple plug-and-chug problems, I’m not sure why I’d update much on this development.
I suppose I just have different intuitions on this. Let’s just make a second bet. I imagine you can find another element for your list you will be comfortable adding—it doesn’t necessarily have to be a dataset, just something in the same spirit as the other items in the list.
I think I’ll pass up an opportunity for a second bet for now. My mistake was being too careless in the first place—and I’m not currently too interested in doing a deeper dive into what might be a good replacement for MATH.
You could just drop MATH and make a bet at different odds on the remaining items.