I imagine this was not your intention, but I’m a little worried that this comment will have an undesirable chilling effect. I think it’s good for people to share when members of DeepMind / OpenAI say something that sounds a lot like “we found evidence of mesaoptimization”.
I also think you’re right that we should be doing a lot better on pushing back against such claims. I hope LW/AF gets better at being as skeptical of AI researchers assertions that support risk as they are of those that undermine risk. But I also hope that when those researchers claim something surprising and (to us) plausibly risky is going on, we continue to hear about it.
I imagine this was not your intention, but I’m a little worried that this comment will have an undesirable chilling effect.
Note that there are desirable chilling effects too. I think it’s broadly important to push back on inaccurate claims, or ones that have the wrong level of confidence. (Like, my comment elsewhere is intended to have a chilling effect.)
I imagine this was not your intention, but I’m a little worried that this comment will have an undesirable chilling effect
I agree it might have this effect, and that it would be bad if that were to happen (all else equal). But I’d much rather have researchers with good beliefs given the evidence they have rather than researchers with lots of evidence but bad beliefs given that evidence.
(As with everything, this is a tradeoff. I haven’t specified exactly how you should weight the tradeoff, because that’s hard to do.)
I imagine this was not your intention, but I’m a little worried that this comment will have an undesirable chilling effect. I think it’s good for people to share when members of DeepMind / OpenAI say something that sounds a lot like “we found evidence of mesaoptimization”.
I also think you’re right that we should be doing a lot better on pushing back against such claims. I hope LW/AF gets better at being as skeptical of AI researchers assertions that support risk as they are of those that undermine risk. But I also hope that when those researchers claim something surprising and (to us) plausibly risky is going on, we continue to hear about it.
Note that there are desirable chilling effects too. I think it’s broadly important to push back on inaccurate claims, or ones that have the wrong level of confidence. (Like, my comment elsewhere is intended to have a chilling effect.)
I agree it might have this effect, and that it would be bad if that were to happen (all else equal). But I’d much rather have researchers with good beliefs given the evidence they have rather than researchers with lots of evidence but bad beliefs given that evidence.
(As with everything, this is a tradeoff. I haven’t specified exactly how you should weight the tradeoff, because that’s hard to do.)