I agree, and even cited a chain of replicated works that indicated that to me over a year ago.
But as I said, there’s a difference between discussing what’s demonstrated in smaller toy models and what’s demonstrated in a production model, or what’s indicated vs what’s explicit. Even though there should be no reasonable inclination to think that a simpler model exhibiting a complex result should be absent or less complex in an exponentially more complex model, I can speak from experience in that explaining extrapolated research as opposed to direct results like Anthropic showed here is a very big difference to a lay audience.
You might understand the implications of the Skill-Mix work or Othello-GPT, or Max Tegmark’s linear representation papers, or Anthropic’s earlier single layer SAE paper, or any other number of research papers over the past year, but as soon as responsibly describing the implications of those works as a speculative conclusion regarding modern models a non-expert audience is going to be lost. Their eyes glaze over at the word ‘probably,’ especially when they want to reject what’s being stated.
The “it’s just fancy autocomplete” influencers have no shame around definitive statements or concern over citable accuracy (and happen to feed into confirmation biases about how new tech is over hyped as a “heuristic that almost always works”), but as someone who does care about the accuracy of representations I haven’t to date been able to point to a single source of truth the way Anthropic delivered here. Instead, I’d point to a half dozen papers all indicating the same direction of results.
And while those experienced in research know that a half dozen papers all indicating the same thing is a better thing to have in one’s pocket than a single larger work, I have already observed a number of minds changing in the comments of the blog post for this in general technology forums in ways dramatically different from all of those other simpler and cheaper methods to date where I was increasingly convinced of a position but the average person was getting held up due to finding ways to (incorrectly) rationalize why it wasn’t correct or wouldn’t translate to production models.
So I agree with you on both the side of “yeah, an informed person would have already known this” as well as “but this might get more buzz.”
I agree, and even cited a chain of replicated works that indicated that to me over a year ago.
But as I said, there’s a difference between discussing what’s demonstrated in smaller toy models and what’s demonstrated in a production model, or what’s indicated vs what’s explicit. Even though there should be no reasonable inclination to think that a simpler model exhibiting a complex result should be absent or less complex in an exponentially more complex model, I can speak from experience in that explaining extrapolated research as opposed to direct results like Anthropic showed here is a very big difference to a lay audience.
You might understand the implications of the Skill-Mix work or Othello-GPT, or Max Tegmark’s linear representation papers, or Anthropic’s earlier single layer SAE paper, or any other number of research papers over the past year, but as soon as responsibly describing the implications of those works as a speculative conclusion regarding modern models a non-expert audience is going to be lost. Their eyes glaze over at the word ‘probably,’ especially when they want to reject what’s being stated.
The “it’s just fancy autocomplete” influencers have no shame around definitive statements or concern over citable accuracy (and happen to feed into confirmation biases about how new tech is over hyped as a “heuristic that almost always works”), but as someone who does care about the accuracy of representations I haven’t to date been able to point to a single source of truth the way Anthropic delivered here. Instead, I’d point to a half dozen papers all indicating the same direction of results.
And while those experienced in research know that a half dozen papers all indicating the same thing is a better thing to have in one’s pocket than a single larger work, I have already observed a number of minds changing in the comments of the blog post for this in general technology forums in ways dramatically different from all of those other simpler and cheaper methods to date where I was increasingly convinced of a position but the average person was getting held up due to finding ways to (incorrectly) rationalize why it wasn’t correct or wouldn’t translate to production models.
So I agree with you on both the side of “yeah, an informed person would have already known this” as well as “but this might get more buzz.”