The verdict that knowledge is purely a property of configurations cannot be naively generalized from real life to GPT simulations, because “physics” and “configurations” play different roles in the two (as I’ll address in the next post). The parable of the two tests, however, literally pertains to GPT. People have a tendency to draw erroneous global conclusions about GPT from behaviors which are in fact prompt-contingent, and consequently there is a pattern of constant discoveries that GPT-3 exceeds previously measured capabilities given alternate conditions of generation[29], which shows no signs of slowing 2 years after GPT-3’s release.
Making the ontological distinction between GPT and instances of text which are propagated by it makes these discoveries unsurprising: obviously, different configurations will be differently capable and in general behave differently when animated by the laws of GPT physics. We can only test one configuration at once, and given the vast number of possible configurations that would attempt any given task, it’s unlikely we’ve found the optimal taker for any test.
Reading this was causally responsible for me undoing any updates I made after being disappointed by my playing with GPT-3. Those observations weren’t more likely inside a weak-GPT world, because a strong-GPT would just as readily simulate weak-simulacra in my contexts as it would strong-simulacra in other contexts.
I think I had all the pieces to have inferred this… but some subverbal part of my cognition was illegitimately epistemically nudged by the manifest limitations of naïvely prompted GPT. That part of me, I now see, should have only been epistemically pushed around by quite serious, professional toying with GPT!
This kind of comment (“this precise part had this precise effect on me”) is a really valuable form of feedback that I’d love to get (and will try to give) more often. Thanks! It’s particularly interesting because someone gave feedback on a draft that the business about simulated test-takers seemed unnecessary and made things more confusing.
Since you mentioned, I’m going to ramble on about some additional nuance on this point.
Here’s an intuition pump which strongly discourages “fundamental attribution error” to the simulator:
Imagine a machine where you feed in an image and it literally opens a window to a parallel reality with that image as a boundary constraint. You can watch events downstream of the still frame unravel through the viewfinder.
If you observe the people in the parallel universe doing something dumb, the obvious first thought is that you should try a frame into a different situation that’s more likely to contain smart people (or even try again, if the frame underdetermines the world and you’ll reveal a different “preexisting” situation each time you run the machine).
That’s the obvious conclusion in the thought experiment because the machine isn’t assigned a mind-like role—it’s just a magical window into a possible world. Presumably, the reason people in a parallel world are dumb or not is located in that world, in the machinery of their brains. “Configuration” and “physics” play the same roles as in our reality.
Now, with intuition pumps it’s important to fiddle with the knobs. An important way that GPT is unlike this machine is that it doesn’t literally open a window into a parallel universe running on the same physics as us, which requires that minds be implemented as machines in the world state, such as brains. The “state” that it propagates is text, a much coarser grained description than microscopic quantum states or even neurons. This means that when simulacra exhibit cognition, it must be GPT—time evolution itself—that’s responsible for a large part of the mind-implementation, as there is nowhere near sufficient machinery in the prompt/state. So if a character is stupid, it may very well be a reflection of GPT’s weakness at compiling text descriptions into latent algorithms simulating cognition.
But it may also be because of the prompt. Despite its short length the prompt does parameterize an innumerable number of qualitatively distinct simulations, and given GPT’s training distribution it’s expected for it sometimes to “try” to simulate stupid things.
There’s also another way that GPT can fail to simulate smart behavior which I think is not reducible to “pretending to be stupid”, which makes the most sense if you think of the prompt as something like an automaton specification which will proceed to evolve according not to a mechanistic physics but GPT’s semantic word physics. Some automata-specifications will simply not work very well—they might get into a loop because they were already a bit repetitive, or fail to activate the relevant knowledge because the style is out-of-distribution and GPT is quite sensitive to form and style, or cause hallucinations and rationalizations instead of effective reasoning because the flow of evidence is backward. But another automaton initialization may glide superbly when animated by GPT physics.
What I’ve found, not through a priori reasoning but lots of toying, is that the quality of intelligence simulated by GPT-3 in response to “typical” prompts tremendously underestimates its “best case” capabilities. And the trends strongly imply that I haven’t found the best case for anything. Give me any task, quantifiable or not, and I am almost certain I can find a prompt that makes GPT-3 do it better after 15 minutes of tinkering, and a better one than that if I had an hour, and a better one than that if I had a day… etc. The problem of finding a good prompt to elicit some capability, especially if it’s open-ended or can be attacked in multiple steps, seems similar to the problem of finding the best mental state to initiate a human to do something well—even if you’re only considering mental states which map to some verbal inner monologue, you could search through possible constructs practically indefinitely without expecting you’ve hit anything near the optimum, because the number of possible relevant and qualitatively distinct possible mental states is astronomical. It’s the same with simulacra configurations.
So one of my motivations for advocating an explicit simulator/simulacra distinction with the analogy to the extreme case of physics (where the configuration is responsible for basically everything) is to make the prompt-contingency of phenomena more intuitive, since I think most peoples’ intuitions are too inclined in the opposite direction of locating responsibility for observed phenomena in GPT itself. But it is important, and I did not sufficiently emphasize in this post, to be aware that the ontological split between “state” and “physics” carves the system differently than in real life, allowing for instance the possibility that simulacra are stupid because GPT is weak.
Reading this was causally responsible for me undoing any updates I made after being disappointed by my playing with GPT-3. Those observations weren’t more likely inside a weak-GPT world, because a strong-GPT would just as readily simulate weak-simulacra in my contexts as it would strong-simulacra in other contexts.
I think I had all the pieces to have inferred this… but some subverbal part of my cognition was illegitimately epistemically nudged by the manifest limitations of naïvely prompted GPT. That part of me, I now see, should have only been epistemically pushed around by quite serious, professional toying with GPT!
This kind of comment (“this precise part had this precise effect on me”) is a really valuable form of feedback that I’d love to get (and will try to give) more often. Thanks! It’s particularly interesting because someone gave feedback on a draft that the business about simulated test-takers seemed unnecessary and made things more confusing.
Since you mentioned, I’m going to ramble on about some additional nuance on this point.
Here’s an intuition pump which strongly discourages “fundamental attribution error” to the simulator:
Imagine a machine where you feed in an image and it literally opens a window to a parallel reality with that image as a boundary constraint. You can watch events downstream of the still frame unravel through the viewfinder.
If you observe the people in the parallel universe doing something dumb, the obvious first thought is that you should try a frame into a different situation that’s more likely to contain smart people (or even try again, if the frame underdetermines the world and you’ll reveal a different “preexisting” situation each time you run the machine).
That’s the obvious conclusion in the thought experiment because the machine isn’t assigned a mind-like role—it’s just a magical window into a possible world. Presumably, the reason people in a parallel world are dumb or not is located in that world, in the machinery of their brains. “Configuration” and “physics” play the same roles as in our reality.
Now, with intuition pumps it’s important to fiddle with the knobs. An important way that GPT is unlike this machine is that it doesn’t literally open a window into a parallel universe running on the same physics as us, which requires that minds be implemented as machines in the world state, such as brains. The “state” that it propagates is text, a much coarser grained description than microscopic quantum states or even neurons. This means that when simulacra exhibit cognition, it must be GPT—time evolution itself—that’s responsible for a large part of the mind-implementation, as there is nowhere near sufficient machinery in the prompt/state. So if a character is stupid, it may very well be a reflection of GPT’s weakness at compiling text descriptions into latent algorithms simulating cognition.
But it may also be because of the prompt. Despite its short length the prompt does parameterize an innumerable number of qualitatively distinct simulations, and given GPT’s training distribution it’s expected for it sometimes to “try” to simulate stupid things.
There’s also another way that GPT can fail to simulate smart behavior which I think is not reducible to “pretending to be stupid”, which makes the most sense if you think of the prompt as something like an automaton specification which will proceed to evolve according not to a mechanistic physics but GPT’s semantic word physics. Some automata-specifications will simply not work very well—they might get into a loop because they were already a bit repetitive, or fail to activate the relevant knowledge because the style is out-of-distribution and GPT is quite sensitive to form and style, or cause hallucinations and rationalizations instead of effective reasoning because the flow of evidence is backward. But another automaton initialization may glide superbly when animated by GPT physics.
What I’ve found, not through a priori reasoning but lots of toying, is that the quality of intelligence simulated by GPT-3 in response to “typical” prompts tremendously underestimates its “best case” capabilities. And the trends strongly imply that I haven’t found the best case for anything. Give me any task, quantifiable or not, and I am almost certain I can find a prompt that makes GPT-3 do it better after 15 minutes of tinkering, and a better one than that if I had an hour, and a better one than that if I had a day… etc. The problem of finding a good prompt to elicit some capability, especially if it’s open-ended or can be attacked in multiple steps, seems similar to the problem of finding the best mental state to initiate a human to do something well—even if you’re only considering mental states which map to some verbal inner monologue, you could search through possible constructs practically indefinitely without expecting you’ve hit anything near the optimum, because the number of possible relevant and qualitatively distinct possible mental states is astronomical. It’s the same with simulacra configurations.
So one of my motivations for advocating an explicit simulator/simulacra distinction with the analogy to the extreme case of physics (where the configuration is responsible for basically everything) is to make the prompt-contingency of phenomena more intuitive, since I think most peoples’ intuitions are too inclined in the opposite direction of locating responsibility for observed phenomena in GPT itself. But it is important, and I did not sufficiently emphasize in this post, to be aware that the ontological split between “state” and “physics” carves the system differently than in real life, allowing for instance the possibility that simulacra are stupid because GPT is weak.