I get the impression from your comments that you think it’s naive to describe this result as “learning algorithms spontaneously emerge.”
I think that’s a fine characterization (and I said so in the grandparent comment? Looking back, I said I agreed with the claim that learning is happening via neural net activations, which I guess doesn’t necessarily imply that I think it’s a fine characterization).
You describe the lack of LW/AF pushback against that description as “a community-wide failure,”
I think my original comment didn’t do a great job of phrasing my objection. My actual critique is that the community as a whole seems to be updating strongly on data-that-has-high-probability-if-you-know-basic-RL.
updating as a result toward thinking AF members “automatically believe anything written in a post without checking it.”
That was one of three possible explanations; I don’t have a strong view on which explanation is the primary cause (if any of them are). It’s more like “I observe clearly-to-me irrational behavior, this seems bad, even if I don’t know what’s causing it”. If I had to guess, I’d guess that the explanation is a combination of readers not bothering to check details and those who are checking details not knowing enough to point out that this is expected.
I feel confused about why, given your model of the situation, the researchers were surprised that this phenomenon occurred, and seem to think it was a novel finding that it will inevitably occur given the three conditions described.
Indeed, I am also confused by this, as I noted in the original comment:
I don’t understand why this was surprising to the original researchers
I have a couple of hypotheses, none of which seem particularly likely given that the authors are familiar with AI, so I just won’t speculate. I agree this is evidence against my claim that this would be obvious to RL researchers.
And this OpenAI paper [...] describes their result in similar terms:
Again, I don’t object to the description of this as learning a learning algorithm. I object to updating strongly on this. Note that the paper does not claim their results are surprising—it is written in a style of “we figured out how to make this approach work”. (The DeepMind paper does claim that the results are novel / surprising, but it is targeted at a neuroscience audience, to whom the results may indeed be surprising.)
I’ve been feeling very confused lately about how people talk about “search,” and have started joking that I’m a search panpsychist.
On the search panpsychist view, my position is that if you use deep RL to train an AGI policy, it is definitionally a mesa optimizer. (Like, anything that is “generally intelligent” has the ability to learn quickly, which on the search panpsychist view means that it is a mesa optimizer.) So in this world, “likelihood of mesa optimization via deep RL” is equivalent to “likelihood of AGI via deep RL”, and “likelihood that more general systems trained by deep RL will be mesa optimizers” is ~1 and you ~can’t update on it.
I think that’s a fine characterization (and I said so in the grandparent comment? Looking back, I said I agreed with the claim that learning is happening via neural net activations, which I guess doesn’t necessarily imply that I think it’s a fine characterization).
I think my original comment didn’t do a great job of phrasing my objection. My actual critique is that the community as a whole seems to be updating strongly on data-that-has-high-probability-if-you-know-basic-RL.
That was one of three possible explanations; I don’t have a strong view on which explanation is the primary cause (if any of them are). It’s more like “I observe clearly-to-me irrational behavior, this seems bad, even if I don’t know what’s causing it”. If I had to guess, I’d guess that the explanation is a combination of readers not bothering to check details and those who are checking details not knowing enough to point out that this is expected.
Indeed, I am also confused by this, as I noted in the original comment:
I have a couple of hypotheses, none of which seem particularly likely given that the authors are familiar with AI, so I just won’t speculate. I agree this is evidence against my claim that this would be obvious to RL researchers.
Again, I don’t object to the description of this as learning a learning algorithm. I object to updating strongly on this. Note that the paper does not claim their results are surprising—it is written in a style of “we figured out how to make this approach work”. (The DeepMind paper does claim that the results are novel / surprising, but it is targeted at a neuroscience audience, to whom the results may indeed be surprising.)
On the search panpsychist view, my position is that if you use deep RL to train an AGI policy, it is definitionally a mesa optimizer. (Like, anything that is “generally intelligent” has the ability to learn quickly, which on the search panpsychist view means that it is a mesa optimizer.) So in this world, “likelihood of mesa optimization via deep RL” is equivalent to “likelihood of AGI via deep RL”, and “likelihood that more general systems trained by deep RL will be mesa optimizers” is ~1 and you ~can’t update on it.