GPT-4 can do math because it has learned particular patterns associated with tokens, including heuristics for certain digits, without fully learning the abstract generalized pattern.
This finding seems consistent with some literatures, such as this where they found that if the multiplication task has an unseen computational graph, then performance deteriorates rapidly. Perhaps check out the keyword “shortcut learning” too.
This finding seems consistent with some literatures, such as this where they found that if the multiplication task has an unseen computational graph, then performance deteriorates rapidly. Perhaps check out the keyword “shortcut learning” too.