The space of all possible algorithms one could run on three-digit-addition-strings like “218+375” seems rather vast. Could it be that what GPT3 is doing is something like
generating a large bunch of candidate algorithms, and
estimating the likelihoods of those algorithms given the examples, and
doing something like a noisy/weak Bayesian update, and
executing one of the higher-posterior algorithms, or some “fuzzy combination” of them?
Obviously this is just wild, vague speculation; but to me it intuitively seems like it would at least sort of answer your question. What do you think? (Could GPT3 be doing something like the above?)
(To a human, it might feel like [the actual algorithm for addition] is a glaringly obvious candidate. But, on something like a noisy simplicity prior over all possible string-manipulations-algorithms, [the actual algorithm for addition] maybe starts looking like just one of the more conspicuous needles in a haystack?)
That seems far too structured to me—I seriously doubt GPT-3 is doing anything like “generating a large bunch of candidate algorithms”, though maybe it has learned heuristics that approximate this sort of computation.
The space of all possible algorithms one could run on three-digit-addition-strings like “218+375” seems rather vast. Could it be that what GPT3 is doing is something like
generating a large bunch of candidate algorithms, and
estimating the likelihoods of those algorithms given the examples, and
doing something like a noisy/weak Bayesian update, and
executing one of the higher-posterior algorithms, or some “fuzzy combination” of them?
Obviously this is just wild, vague speculation; but to me it intuitively seems like it would at least sort of answer your question. What do you think? (Could GPT3 be doing something like the above?)
(To a human, it might feel like [the actual algorithm for addition] is a glaringly obvious candidate. But, on something like a noisy simplicity prior over all possible string-manipulations-algorithms, [the actual algorithm for addition] maybe starts looking like just one of the more conspicuous needles in a haystack?)
That seems far too structured to me—I seriously doubt GPT-3 is doing anything like “generating a large bunch of candidate algorithms”, though maybe it has learned heuristics that approximate this sort of computation.