Daniel Paleka comments on Public-facing Censorship Is Safety Theater, Causing Reputational Damage

Daniel Paleka 23 Sep 2022 10:14 UTC
−2 points
−12
Let me first say I dislike the conflict-theoretic view presented in the “censorship bad” paragraph. On the short list of social media sites I visit daily, moderation creates a genuinely better experience. Automated censorship will become an increasingly important force for good as generative models start becoming more widespread.
Secondly, there is a danger of AI safety becoming less robust—or even optimising for deceptive alignment—in models using front-end censorship.^[3]
This one is interesting, but only in the counterfactual: “if AI ethics technical research focused on actual value alignment of models as opposed to front-end censorship, this would have higher-order positive effects for AI x-safety”. But it doesn’t directly hurt AI x-safety research right now: we already work under the assumption that that output filtering is not a solution for x-risk.

It is clear improved technical research norms on AI non-x-risk safety can have positive effects on AI x-risk. If we could train a language model to robustly align to any set of human-defined values at all, this would be an improvement to the current situation.

But, there are other factors to consider. Is “making the model inherently non-racist” a better proxy for alignment than some other technical problems? Could interacting with that community weaken the epistemic norms in AI x-safety?

Calling content censorship “AI safety” (or even “bias reduction”) severely damages the reputation of actual, existential AI safety advocates.
I would need to significantly update my prior if this turns out to be a very important concern. Who are people, whose opinions will be relevant at some point, that understand both what AI non-x-safety and AI x-safety are about, dislike the former, are sympathetic to the latter, but conflate them?
- Yitz 23 Sep 2022 10:40 UTC
  2 points
  0
  Parent
  ~~???~~
  What links here?
  - Daniel Paleka's comment on Public-facing Censorship Is Safety Theater, Causing Reputational Damage by Yitz (23 Sep 2022 10:14 UTC; -2 points)
  - Daniel Paleka 23 Sep 2022 11:00 UTC
    3 points
    0
    Parent
    I don’t know why it sent only the first sentence; I was drafting a comment on this. I wanted to delete it but I don’t know how.
    EDIT: wrote the full comment now.