First comment: I don’t think their experiment about code execution is much evidence re “true understanding.”
I agree that humans would do poorly in the experiment you outline. I think this shows that, like the language model, humans-with-one-second do not “understand” the code.
(Idk if you were trying to argue something else with the comparison, but I don’t think it’s clear that this is a reasonable comparison; there are tons of objections you could bring up. For example, humans have to work from pixels whereas the language model gets tokens, making its job much easier.)
Second comment: Speculation about scaling trends:
I didn’t check the numbers, but that seems pretty reasonable. I think there’s a question of whether it actually saves time in the current format—it might be faster to simply write the program than to write down a clear natural language description of what you want along with test cases.
I agree that humans would do poorly in the experiment you outline. I think this shows that, like the language model, humans-with-one-second do not “understand” the code.
Haha, good point—yes. I guess what I should say is: Since humans would have performed just as poorly on this experiment, it doesn’t count as evidence that e.g. “current methods are fundamentally limited” or “artificial neural nets can’t truly understand concepts in the ways humans can” or “what goes on inside ANN’s is fundamentally a different kind of cognition from what goes on inside biological neural nets” or whatnot.
I agree that humans would do poorly in the experiment you outline. I think this shows that, like the language model, humans-with-one-second do not “understand” the code.
(Idk if you were trying to argue something else with the comparison, but I don’t think it’s clear that this is a reasonable comparison; there are tons of objections you could bring up. For example, humans have to work from pixels whereas the language model gets tokens, making its job much easier.)
I didn’t check the numbers, but that seems pretty reasonable. I think there’s a question of whether it actually saves time in the current format—it might be faster to simply write the program than to write down a clear natural language description of what you want along with test cases.
Haha, good point—yes. I guess what I should say is: Since humans would have performed just as poorly on this experiment, it doesn’t count as evidence that e.g. “current methods are fundamentally limited” or “artificial neural nets can’t truly understand concepts in the ways humans can” or “what goes on inside ANN’s is fundamentally a different kind of cognition from what goes on inside biological neural nets” or whatnot.
Oh yeah, I definitely agree that this is not strong evidence for typical skeptic positions (and I’d guess the authors would agree).