This post distinguishes between mesa-optimization and learned heuristics. What you’re describing sounds like learned heuristics. (“Learning which words are easy to rhyme” was an example I gave in the post.) Learned heuristics aren’t nearly as worrisome as mesa-optimization because they’re harder to modify and misuse to do planning in unexpected domains. When I say “lookahead” in the post I’m pretty much always referring to the mesa-optimization sort.
Suppose I said (and I actually believe something like this is true):
“GPT often considers multiple possibilities in parallel for where the text is heading—including both where it’s heading in the short-term (is this sentence going to end with a prepositional phrase or is it going to turn into a question?) and where it’s heading in the long-term (will the story have a happy ending or a sad ending?)—and it calculates which of those possibilities are most likely in light of the text so far. It chooses the most likely next word in light of this larger context it figured out about where the text is heading.”
If that’s correct, would you call GPT a mesa-optimizer?
Well I suppose mesa-optimization isn’t really a binary is it? Like, maybe there’s a trivial sense in which self-attention “mesa-optimizes” over its input when figuring out what to pay attention to.
But ultimately, what matters isn’t the definition of the term “mesa-optimization”, it’s the risk of spontaneous internal planning/optimization that generalizes in unexpected ways or operates in unexpected domains. At least in my mind. So the question is whether this considering multiple possibilities about text stuff could also improve its ability to consider multiple possibilities in other domains. Which depends on whether the implementation of “considering multiple possibilities” looks more like beam search vs very domain-adapted heuristics.
I think the Transformer is successful in part because it tends to solve problems by considering multiple possibilities, processing them in parallel, and picking the one that looks best. (Selection-type optimization.) If you train it on text prediction, that’s part of how it will do text prediction. If you train it on a different domain, that’s part of how it will solve problems in that domain too.
I don’t think GPT builds a “mesa-optimization infrastructure” and then applies that infrastructure to language modeling. I don’t think it needs to. I think the Transformer architecture is already raring to go forth and mesa-optimize, as soon as you as you give it any optimization pressure to do so.
So anyway your question is: can it display foresight / planning in a different domain via without being trained in that domain? I would say, “yeah probably, because practically every domain is instrumentally useful for text prediction”. So somewhere in GPT-3′s billions of parameters I think there’s code to consider multiple possibilities, process them in parallel, and pick the best answer, in response to the question of What will happen next when you put a sock in a blender? or What is the best way to fix an oil leak?—not just those literal words as a question, but the concepts behind them, however they’re invoked.
(Having said that, I don’t think GPT-3 specifically will do side-channel attacks, but for other unrelated reasons off-topic. Namely, I don’t think it is capable of make the series of new insights required to develop an understanding of itself and its situation and then take appropriate actions. That’s based on my speculations here.)
This post distinguishes between mesa-optimization and learned heuristics. What you’re describing sounds like learned heuristics. (“Learning which words are easy to rhyme” was an example I gave in the post.) Learned heuristics aren’t nearly as worrisome as mesa-optimization because they’re harder to modify and misuse to do planning in unexpected domains. When I say “lookahead” in the post I’m pretty much always referring to the mesa-optimization sort.
Suppose I said (and I actually believe something like this is true):
“GPT often considers multiple possibilities in parallel for where the text is heading—including both where it’s heading in the short-term (is this sentence going to end with a prepositional phrase or is it going to turn into a question?) and where it’s heading in the long-term (will the story have a happy ending or a sad ending?)—and it calculates which of those possibilities are most likely in light of the text so far. It chooses the most likely next word in light of this larger context it figured out about where the text is heading.”
If that’s correct, would you call GPT a mesa-optimizer?
Well I suppose mesa-optimization isn’t really a binary is it? Like, maybe there’s a trivial sense in which self-attention “mesa-optimizes” over its input when figuring out what to pay attention to.
But ultimately, what matters isn’t the definition of the term “mesa-optimization”, it’s the risk of spontaneous internal planning/optimization that generalizes in unexpected ways or operates in unexpected domains. At least in my mind. So the question is whether this considering multiple possibilities about text stuff could also improve its ability to consider multiple possibilities in other domains. Which depends on whether the implementation of “considering multiple possibilities” looks more like beam search vs very domain-adapted heuristics.
I think the Transformer is successful in part because it tends to solve problems by considering multiple possibilities, processing them in parallel, and picking the one that looks best. (Selection-type optimization.) If you train it on text prediction, that’s part of how it will do text prediction. If you train it on a different domain, that’s part of how it will solve problems in that domain too.
I don’t think GPT builds a “mesa-optimization infrastructure” and then applies that infrastructure to language modeling. I don’t think it needs to. I think the Transformer architecture is already raring to go forth and mesa-optimize, as soon as you as you give it any optimization pressure to do so.
So anyway your question is: can it display foresight / planning in a different domain via without being trained in that domain? I would say, “yeah probably, because practically every domain is instrumentally useful for text prediction”. So somewhere in GPT-3′s billions of parameters I think there’s code to consider multiple possibilities, process them in parallel, and pick the best answer, in response to the question of What will happen next when you put a sock in a blender? or What is the best way to fix an oil leak?—not just those literal words as a question, but the concepts behind them, however they’re invoked.
(Having said that, I don’t think GPT-3 specifically will do side-channel attacks, but for other unrelated reasons off-topic. Namely, I don’t think it is capable of make the series of new insights required to develop an understanding of itself and its situation and then take appropriate actions. That’s based on my speculations here.)