Vladimir_Nesov comments on Anthropic: Reflections on our Responsible Scaling Policy

Vladimir_Nesov 21 May 2024 22:49 UTC
2 points
−6

if the contributor has built something consistently or overall harmful that is indeed on them

I agree, this is in accord with the dogma. But for AI, overall harm is debatable and currently purely hypothetical, so this doesn’t really apply. There is a popular idea that existential risk from AI has little basis in reality since it’s not already here to be observed. Thus contributing to public AI efforts remains fine (which on first order effects is perfectly fine right now).

My worry is that this attitude reframes commitments from RSP-like documents, so that people don’t see the obvious implication of how releasing weights breaks the commitments (absent currently impossible feats of unlearning), and don’t see themselves as making a commitment to avoid releasing high-ASL weights even as they commit to such RSPs. If this point isn’t written down, some people will only become capable of noticing it if actual catastrophes shift the attitude to open weights foundation models being harmful overall (even after we already get higher up in ASLs). Which doesn’t necessarily happen even if there are some catastrophes with a limited blast radius, since they get to be balanced out by positive effects.