Nathan Helm-Burger comments on A Mechanistic Interpretability Analysis of Grokking

Nathan Helm-Burger 22 Aug 2022 16:37 UTC
1 point
0
Maybe this is a bit off topic but I thought about this some more and think maybe the conversation is missing something.
I have a hypothesis that the memorization → grokking thing is actually maybe like a spectrum of understanding or maybe there’s some third thing like ‘actual conceptual understanding which truly generalizes’.
So consider we have four different models. One has memorized all solutions of integer addition up to 4 digits. Another has generalized integer addition up to 5 digits, but makes mistakes on 6+ digits. This is somewhere in-between memorization and grokking. The third has fully understood addition, it fully grokked this during training and can solve arbitrarily long addition problems (given limitations of context size). A fourth doesn’t know addition beyond single digits but is able to read and follow instructions in some sort of flexible way. It groks multi-digit addition just from the prompt alone.
Models 1 & 2 should fail the following test. Models 3 & 4 should pass. Most reasonably educated adults should pass (barring occasional mistakes due to lapses in concentration).
I feel like we’re getting caught up thinking about models 1 & 2, or maybe 1 vs <unclear confusion between 2,3,4> without properly checking for and distinguishing between all four types.
[My description of addition in this prompt could be improved, but you get the idea.]
Prompt:
“Congratulations, you’ve made it to the final round of our quiz show! The only question left you have to answer in order to win $1000000 is this difficult addition problem. But don’t worry, you have as much time and scratch paper as you need to answer the question. Also, we’ll explain the basics of long addition to you right here so you have a refresher.
Positive integers are written as a series of numerals from the set [0,1,2,3,4,5,6,7,8,9]. Each position in the number represents a different amount of value. The rightmost digit is called the ‘ones’ place, the next one to the left is the ‘tens’ place, the third is the ‘hundreds’ place. This continues increasing by 10x each time.
To do long addition of positive integers, write the two numbers out one above the other so that their digits are aligned on the right hand side. Draw a horizontal line beneath them to keep your answer clearly separate from the question. Starting on the right, add the first two aligned numbers together. If the resulting number is larger than 9, you must ‘carry the one’ to the next column to the left. This means that for the column result you just got, you keep the ‘ones’ digit in place for that column and add a 1 to the next column to the left so that when you solve that column you will be adding three numbers together, the two original numbers and the ‘carried’ 1. If you carry the 1 past the left-hand side of the numbers, then you imagine adding it to implied zeros there, thus it is just 1.
For example:
To solve 998 + 723 =
First, line them up
9 9 8
+ 7 2 3
_____
Then start on the right
8 + 3 = 11
11 is greater than 9, so carry the one.
1
9 9 8
+ 7 2 3
_____
1
Next, do the second column
1 + 9 + 2 = 12
12 is greater than 9 so carry the 1.
1 1
9 9 8
+ 7 2 3
_______
2 1
Now we solve the third column
1 + 9 + 7 = 17
17 is greater than 9 so carry the 1.
1 1 1
9 9 8
+ 7 2 3
_______
7 2 1
Now we have created an additional column to the left by carrying a 1 past the lefthand most digit. We solve this by adding it to imaginary placeholder zeros.
1 + 0 + 0 = 1
1 1 1
9 9 8
+ 7 2 3
_______
1 7 2 1
All steps are now complete. The answer is 1721.
Take your time, show your work, and double check your work to make sure you haven’t made an error! Even a single digit wrong disqualifies you!
Question: 4003739475 + 9630331118 =
″
- Nathan Helm-Burger 22 Aug 2022 16:57 UTC
  1 point
  0
  Parent
  Just for curiosity I tried this with GPT-3 and it gave this incorrect answer:
  “Answer: 13634120593”
  (correct answer 13634070593)
  When I put a space between each digit of the question it instead answered:
  ″
  4 0 0 3 7 3 9 4 7 5
  + 9 6 3 0 3 3 1 1 1 8
  _______________
  1 3 6 3 7 7 0 5 8 4
  ″