I tried a bit of handholding for simple program simulation, as follows:
[--- prompt begins ---]
Consider this function written in Python.
def f(n):
if n ⇐ 1: return n
else: return f(n-1) + f(n-2)
What is the value of f(5)?
Answer:
Since 5 ⇐ 1 is false, f(5) equals f(4) + f(3), so we need to know the values of those.
Since 4 ⇐ 1 is false, f(4) equals f(3) + f(2), so we also need to know f(2).
Since 3 ⇐ 1 is false, f(3) equals f(2) + f(1) = f(2) + 1.
Since 2 ⇐ 1 is false, f(2) equals f(1) + f(0) = 1 + 0 = 1.
So now we can work out that f(3) = f(2) + f(1) = 1 + 1 = 2.
So f(4) = f(3) + f(2) = 2 + 1 = 3.
So f(5) = f(4) + f(3) = 3 + 2 = 5.
So: The value of f(5) is 5.
Consider this function written in Python.
def g(n):
if n ⇐ 1: return n
else: return n * g(n-1)
What is the value of g(6)?
Answer:
[--- prompt ends, GPT-3 output begins ---]
Since 6 ⇐ 1 is false, g(6) equals 6 * g(5), so we need to know the value of g(5).
Since 5 ⇐ 1 is false, g(5) equals 5 * g(4), so we also need to know g(4).
Since 4 ⇐ 1 is false, g(4) equals 4 * g(3), so we also need to know g(3).
Since 3 ⇐ 1 is false, g(3) equals 3 * g(2), so we also need to know g(2).
Since 2 ⇐ 1 is false, g(2) equals 2 * g(1), so we also need to know g(1).
Since 1 ⇐ 1 is true, g(1) equals 1.
So: The value of g(6) is 6 * 5 * 4 * 3 * 2 * 1 = 720.
[--- GPT-3 output ends ---]
On another run, it got as far as working out g(1) and started working upwards again (after the fashion of the Fibonacci example I gave it) but sadly it stopped emitting tokens immediately before actually saying what the value of g(6) is. I bet it would have got it right :-).
… But. I next tried replacing g with a function (iterative rather than recursive) that reverses a string one character at a time and asked it for g(“robot”) … and it just spat out ’The value of g(“robot”) is “tobor”) without showing any of its working. That was pretty surprising and maybe argues against the idea that it has little ability to execute code “in its head”. If I ask it for g(“I am a robot”) it gives “tobora ma I”, missing one space, which seems like a fairly characteristic error on account of how GPT-3 tokenization works but also the kind of error a human might make. For “I am a very stupid robot” it gave (still with no “tracing”) the answer “tobot ypsut evir a ma I” which is starting to look much less like something a human might do.
So then I tried giving it something trickier and explicitly asking it to show its working. Still taking a string as input, the new function starts at 0 and repeatedly does i = (2*i+1)%len(s) while building up its output, stopping if i reaches 0 again. So if you give it “robot” it will look at indices 0, 1, 3, 2, 0 and should output “roob”. Here is what GPT-3 actually wrote. (I had added the words “Please show your working.” to the prompt.)
The first time through the loop, i becomes 1 and result becomes “r”.
The second time through the loop, i becomes 3 and result becomes “ro”.
The third time through the loop, i becomes 7 and result becomes “rob”.
The fourth time through the loop, i becomes 15 and result becomes “robo”.
The fifth time through the loop, i becomes 31 and result becomes “robot”.
At this point, i == 0, so the while loop breaks.
So: The value of g(“robot”) is “robot”.
This is not so impressive. It’s understood the i → 2i+1 thing but not the modulus operator. It has some sort of preference for just taking the letters of the input string in order and is doing that regardless of the actual values taken by i. I can, actually, imagine a human making these mistakes, but not a very bright one.
[EDITED to add: This is all with text-davinci-002 in the OpenAI Playground. There are some extra line breaks in my prompts that I haven’t reproduced here because my comment takes up too much vertical space already. These were all first attempts—no cherry-picking—except that the “another run” for the Fibonacci/factorial example was actually the first run and the result shown was the second.]
I tried a bit of handholding for simple program simulation, as follows:
[--- prompt begins ---]
Consider this function written in Python.
def f(n):
if n ⇐ 1: return n
else: return f(n-1) + f(n-2)
What is the value of f(5)?
Answer:
Since 5 ⇐ 1 is false, f(5) equals f(4) + f(3), so we need to know the values of those.
Since 4 ⇐ 1 is false, f(4) equals f(3) + f(2), so we also need to know f(2).
Since 3 ⇐ 1 is false, f(3) equals f(2) + f(1) = f(2) + 1.
Since 2 ⇐ 1 is false, f(2) equals f(1) + f(0) = 1 + 0 = 1.
So now we can work out that f(3) = f(2) + f(1) = 1 + 1 = 2.
So f(4) = f(3) + f(2) = 2 + 1 = 3.
So f(5) = f(4) + f(3) = 3 + 2 = 5.
So: The value of f(5) is 5.
Consider this function written in Python.
def g(n):
if n ⇐ 1: return n
else: return n * g(n-1)
What is the value of g(6)?
Answer:
[--- prompt ends, GPT-3 output begins ---]
Since 6 ⇐ 1 is false, g(6) equals 6 * g(5), so we need to know the value of g(5).
Since 5 ⇐ 1 is false, g(5) equals 5 * g(4), so we also need to know g(4).
Since 4 ⇐ 1 is false, g(4) equals 4 * g(3), so we also need to know g(3).
Since 3 ⇐ 1 is false, g(3) equals 3 * g(2), so we also need to know g(2).
Since 2 ⇐ 1 is false, g(2) equals 2 * g(1), so we also need to know g(1).
Since 1 ⇐ 1 is true, g(1) equals 1.
So: The value of g(6) is 6 * 5 * 4 * 3 * 2 * 1 = 720.
[--- GPT-3 output ends ---]
On another run, it got as far as working out g(1) and started working upwards again (after the fashion of the Fibonacci example I gave it) but sadly it stopped emitting tokens immediately before actually saying what the value of g(6) is. I bet it would have got it right :-).
… But. I next tried replacing g with a function (iterative rather than recursive) that reverses a string one character at a time and asked it for g(“robot”) … and it just spat out ’The value of g(“robot”) is “tobor”) without showing any of its working. That was pretty surprising and maybe argues against the idea that it has little ability to execute code “in its head”. If I ask it for g(“I am a robot”) it gives “tobora ma I”, missing one space, which seems like a fairly characteristic error on account of how GPT-3 tokenization works but also the kind of error a human might make. For “I am a very stupid robot” it gave (still with no “tracing”) the answer “tobot ypsut evir a ma I” which is starting to look much less like something a human might do.
So then I tried giving it something trickier and explicitly asking it to show its working. Still taking a string as input, the new function starts at 0 and repeatedly does i = (2*i+1)%len(s) while building up its output, stopping if i reaches 0 again. So if you give it “robot” it will look at indices 0, 1, 3, 2, 0 and should output “roob”. Here is what GPT-3 actually wrote. (I had added the words “Please show your working.” to the prompt.)
The first time through the loop, i becomes 1 and result becomes “r”.
The second time through the loop, i becomes 3 and result becomes “ro”.
The third time through the loop, i becomes 7 and result becomes “rob”.
The fourth time through the loop, i becomes 15 and result becomes “robo”.
The fifth time through the loop, i becomes 31 and result becomes “robot”.
At this point, i == 0, so the while loop breaks.
So: The value of g(“robot”) is “robot”.
This is not so impressive. It’s understood the i → 2i+1 thing but not the modulus operator. It has some sort of preference for just taking the letters of the input string in order and is doing that regardless of the actual values taken by i. I can, actually, imagine a human making these mistakes, but not a very bright one.
[EDITED to add: This is all with text-davinci-002 in the OpenAI Playground. There are some extra line breaks in my prompts that I haven’t reproduced here because my comment takes up too much vertical space already. These were all first attempts—no cherry-picking—except that the “another run” for the Fibonacci/factorial example was actually the first run and the result shown was the second.]