Thanks for the thoughtful reply. I definitely acknowledge and appreciate your experience. I agree the test you proposed would be worth doing and would provide some evidence. I think it would have to be designed carefully so that the model knows it is doing fake arithmetic rather than ordinary arithmetic. Maybe the prompt could be something like “Consider the following made-up mathematical operation “@. ” 3@7=8, 4@4 = 3, … [more examples], What does 2@7 equal? Answer: 2@7 equals ” I also think that we shouldn’t expect GPT-3 to be able to do general formal reasoning at a level higher than, say, a fifth grade human. After all, it’s been trained on a similar dataset (english, mostly non-math, but a bit of normal arithmetic math).
Are you saying that GPT-3 has learned linguistic general reasoning but not non-linguistic general reasoning? I’m not sure there’s an important distinction there.
It doesn’t surprise me that you need to scale up a bunch to get this sort of stuff. After all, we are still well below the size of the human brain.
Side question about experience: Surveys seem to show that older AI scientists, who have been working in the field for longer, tend to think AGI is farther in the future—median 100+ years for scientists with 20+ years of experience, if I recall correctly. Do you think that this phenomenon represents a bias on the part of older scientists, younger scientists, both, or neither?
Also note that a significant number of humans would fail the kind of test you described (inducing the behavior of a novel mathematical operation from a relatively small number of examples), which is why similar tests of inductive reasoning ability show up quite often on IQ tests and the like. It’s not the case that failing at that kind of test shows a lack of general reasoning skills, unless we permit that a substantial fraction of humans lack general reasoning skills to at least some extent.
Thanks for the thoughtful reply. I definitely acknowledge and appreciate your experience. I agree the test you proposed would be worth doing and would provide some evidence. I think it would have to be designed carefully so that the model knows it is doing fake arithmetic rather than ordinary arithmetic. Maybe the prompt could be something like “Consider the following made-up mathematical operation “@. ” 3@7=8, 4@4 = 3, … [more examples], What does 2@7 equal? Answer: 2@7 equals ” I also think that we shouldn’t expect GPT-3 to be able to do general formal reasoning at a level higher than, say, a fifth grade human. After all, it’s been trained on a similar dataset (english, mostly non-math, but a bit of normal arithmetic math).
Are you saying that GPT-3 has learned linguistic general reasoning but not non-linguistic general reasoning? I’m not sure there’s an important distinction there.
It doesn’t surprise me that you need to scale up a bunch to get this sort of stuff. After all, we are still well below the size of the human brain.
Side question about experience: Surveys seem to show that older AI scientists, who have been working in the field for longer, tend to think AGI is farther in the future—median 100+ years for scientists with 20+ years of experience, if I recall correctly. Do you think that this phenomenon represents a bias on the part of older scientists, younger scientists, both, or neither?
Also note that a significant number of humans would fail the kind of test you described (inducing the behavior of a novel mathematical operation from a relatively small number of examples), which is why similar tests of inductive reasoning ability show up quite often on IQ tests and the like. It’s not the case that failing at that kind of test shows a lack of general reasoning skills, unless we permit that a substantial fraction of humans lack general reasoning skills to at least some extent.