Fwiw I am somewhat more sympathetic here to “the line between safety and capabilities is blurry, Anthropic has previously published some interpretability research that turned out to help someone else do some capabilities advances.”
I have heard Anthropic is bottlenecked on having people with enough context and discretion to evaluate various things that are “probably fine to publish” but “not obviously fine enough to ship without taking at least a chunk of some busy person’s time”. I think in this case I basically take the claim at face value.
I do want to generally keep pressuring them to somehow resolve that bottleneck because it seems very important, but, I don’t know that I disproportionately would complain at them about this particular thing.
(I’d also not surprised if, while the above claim is true, Anthropic is still suspiciously dragging it’s feet disproportionately in areas that feel like they make more of a competitive sacrifice, but, I wouldn’t actively bet on it)
If you’re right, I think the upshot is (a) Anthropic should figure out whether to publish stuff rather than let it languish and/or (b) it would be better for lots of Anthropic safety researchers to instead do research that’s safe to share (rather than research that only has value if Anthropic wins the race)
Fwiw I am somewhat more sympathetic here to “the line between safety and capabilities is blurry, Anthropic has previously published some interpretability research that turned out to help someone else do some capabilities advances.”
I have heard Anthropic is bottlenecked on having people with enough context and discretion to evaluate various things that are “probably fine to publish” but “not obviously fine enough to ship without taking at least a chunk of some busy person’s time”. I think in this case I basically take the claim at face value.
I do want to generally keep pressuring them to somehow resolve that bottleneck because it seems very important, but, I don’t know that I disproportionately would complain at them about this particular thing.
(I’d also not surprised if, while the above claim is true, Anthropic is still suspiciously dragging it’s feet disproportionately in areas that feel like they make more of a competitive sacrifice, but, I wouldn’t actively bet on it)
Sounds fatebookable tho, so let’s use ye Olde Fatebook Chrome extension:
⚖ In 4 years, Ray will think it is pretty obviously clear that Anthropic was strategically avoiding posting alignment research for race-winning reasons. (Raymond Arnold: 17%)
(low probability because I expect it to still be murky/unclear)
I tentatively think this is a high-priority ask
Capabilities research isn’t a monolith and improving capabilities without increasing spooky black-box reasoning seems pretty fine
If you’re right, I think the upshot is (a) Anthropic should figure out whether to publish stuff rather than let it languish and/or (b) it would be better for lots of Anthropic safety researchers to instead do research that’s safe to share (rather than research that only has value if Anthropic wins the race)