Doing things that feel good—and look good to many ethics-minded observers—but are more motivated by purity than seeking to do as much good as possible and thus likely to be much less valuable than the best way to do good (on the margin)
Focusing on avoiding doing harm yourself, rather than focusing on net good or noticing how your actions affect others[2] (related concept: inaction risk)
I’m worried that Anthropic will be in carbon-offset mindset with respect to AI welfare.
This seems like an important issue to me. I read Anthropic’s press releases, and this doc to a lesser extent, and I find myself picturing a group of people with semi-automatic rifles standing near a mass of dead bodies. The most honest and trustworthy among them says quickly as you gape at the devastation, “Not me! My hands are clean! I didn’t shoot anyone, I just stood by and watched!”
I believe them, and yet, I do not feel entirely reassured.
I don’t want to hold Anthropic responsible for saving the world, but I sure would like to see more emphasis on the actions they could take which could help prevent disaster from someone else’s AI, not just their own. I think the responsible thing for them to do could be something along the lines of using their expertise and compute and private evals to also evaluate open-weights models, and share these reports with the government. I think there’s a lot of people in government who won’t strongly support an AI safety agency doing mandatory evals until they’ve seen clear demonstrations of danger.
Maybe Anthropic is doing, or plans to do this, and they aren’t mentioning it because they don’t want to draw the wrath of the open-weights model publishers. That would be reasonable. But then, I expect that the government would claim the credit for having discovered that the open weights models were dangerous. I don’t hear that happening either.
Zach Stein-Perlman says:
This seems like an important issue to me. I read Anthropic’s press releases, and this doc to a lesser extent, and I find myself picturing a group of people with semi-automatic rifles standing near a mass of dead bodies. The most honest and trustworthy among them says quickly as you gape at the devastation, “Not me! My hands are clean! I didn’t shoot anyone, I just stood by and watched!”
I believe them, and yet, I do not feel entirely reassured.
I don’t want to hold Anthropic responsible for saving the world, but I sure would like to see more emphasis on the actions they could take which could help prevent disaster from someone else’s AI, not just their own. I think the responsible thing for them to do could be something along the lines of using their expertise and compute and private evals to also evaluate open-weights models, and share these reports with the government. I think there’s a lot of people in government who won’t strongly support an AI safety agency doing mandatory evals until they’ve seen clear demonstrations of danger.
Maybe Anthropic is doing, or plans to do this, and they aren’t mentioning it because they don’t want to draw the wrath of the open-weights model publishers. That would be reasonable. But then, I expect that the government would claim the credit for having discovered that the open weights models were dangerous. I don’t hear that happening either.
Did you mean Zach Stein-Perlman or Zac Hatfield-Dodds?