I guess I should explain why I upvoted this post despite agreeing with you that it’s not new evidence in favor of mesa-optimization. I actually had a conversation about this post with Adam Shimi prior to you commenting on it where I explained to him that I thought that not only was none of it new but also that it wasn’t evidence about the internal structure of models and therefore wasn’t really evidence about mesa-optimization. Nevertheless, I chose to upvote the post and not comment my thoughts on it. Some reasons why I did that:
I generally upvote most attempts on LW/AF to engage with the academic literature—I think that LW/AF would generally benefit from engaging with academia more and I like to do what I can to encourage that when I see it.
I didn’t feel like any comment I would have made would have anything more to say than things I’ve said in the past. In fact, in “Risks from Learned Optimization” itself, we talk about both a) why we chose to be agnostic about whether current systems exhibit mesa-optimization due to the difficulty of determining whether a system is actually implementing search or not (link) and b) examples of current work that we thought did seem to come closest to being evidence of mesa-optimization such as RL^2 (and I think RL^2 is a better example than the work linked here) (link).
(Flagging that I curated the post, but was mostly relying on Ben and Habryka’s judgment, in part since I didn’t see much disagreement. Since this discussion I’ve become more agnostic about how important this post is)
One thing this comment makes me want is more nuanced reacts that people have affordance to communicate how they feel about a post, in a way that’s easier to aggregate.
Though I also notice that with this particular post it’s a bit unclear what the react would be appropriate, since it sounds like it’s not “disagree” so much as “this post seems confused” or something.
FWIW, I appreciated that your curation notice explicitly includes the desire for more commentary on the results, and that curating it seems to have been a contributor to there being more commentary.
I didn’t feel like any comment I would have made would have anything more to say than things I’ve said in the past.
FWIW, I say: don’t let that stop you! (Don’t be afraid to repeat yourself, especially if there’s evidence that the point has not been widely appreciated.)
Unfortunately, I also only have so much time, and I don’t generally think that repeating myself regularly in AF/LW comments is a super great use of it.
The solution is clear: someone needs to create an Evan bot that will comment on every post of the AF related to mesa-optimization, by providing the right pointers to the paper.
Fair enough, those are sensible reasons. I don’t like the fact that the incentive gradient points away from making intellectual progress, but it’s not an obvious choice.
I guess I should explain why I upvoted this post despite agreeing with you that it’s not new evidence in favor of mesa-optimization. I actually had a conversation about this post with Adam Shimi prior to you commenting on it where I explained to him that I thought that not only was none of it new but also that it wasn’t evidence about the internal structure of models and therefore wasn’t really evidence about mesa-optimization. Nevertheless, I chose to upvote the post and not comment my thoughts on it. Some reasons why I did that:
I generally upvote most attempts on LW/AF to engage with the academic literature—I think that LW/AF would generally benefit from engaging with academia more and I like to do what I can to encourage that when I see it.
I didn’t feel like any comment I would have made would have anything more to say than things I’ve said in the past. In fact, in “Risks from Learned Optimization” itself, we talk about both a) why we chose to be agnostic about whether current systems exhibit mesa-optimization due to the difficulty of determining whether a system is actually implementing search or not (link) and b) examples of current work that we thought did seem to come closest to being evidence of mesa-optimization such as RL^2 (and I think RL^2 is a better example than the work linked here) (link).
(Flagging that I curated the post, but was mostly relying on Ben and Habryka’s judgment, in part since I didn’t see much disagreement. Since this discussion I’ve become more agnostic about how important this post is)
One thing this comment makes me want is more nuanced reacts that people have affordance to communicate how they feel about a post, in a way that’s easier to aggregate.
Though I also notice that with this particular post it’s a bit unclear what the react would be appropriate, since it sounds like it’s not “disagree” so much as “this post seems confused” or something.
FWIW, I appreciated that your curation notice explicitly includes the desire for more commentary on the results, and that curating it seems to have been a contributor to there being more commentary.
FWIW, I say: don’t let that stop you! (Don’t be afraid to repeat yourself, especially if there’s evidence that the point has not been widely appreciated.)
Unfortunately, I also only have so much time, and I don’t generally think that repeating myself regularly in AF/LW comments is a super great use of it.
Very fair.
The solution is clear: someone needs to create an Evan bot that will comment on every post of the AF related to mesa-optimization, by providing the right pointers to the paper.
Fair enough, those are sensible reasons. I don’t like the fact that the incentive gradient points away from making intellectual progress, but it’s not an obvious choice.