CarlShulman comments on Rant on Problem Factorization for Alignment

CarlShulman 8 Aug 2022 19:12 UTC
LW: 7 AF: 6
0
AF
4. the rest of the world pays attention to large or powerful real-world bureaucracies and force rules on them that small teams / individuals can ignore (e.g. Secret Congress, Copenhagen interpretation of ethics, startups being able to do illegal stuff), but this presumably won’t apply to alignment approaches.

I think a lot of alignment tax-imposing interventions (like requiring local work to be transparent for process-based feedback) could be analogous?
- Rohin Shah 17 Aug 2022 21:17 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Hmm, maybe? There are a few ways this could go:
  1. We give feedback to the model on its reasoning, that feedback is bad in the same way that “the rest of the world pays attention and forces dumb rules on them” is bad
  2. “Keep your reasoning transparent” is itself a dumb rule that we force upon the AI system that leads to terrible bureaucracy problems
  I’m unsure about (2) and mostly disagree with (1) (and I think you were mostly saying (2)).
  Disagreement with (1): Seems like the disanalogy relies pretty hard on the rest of the world not paying much attention when they force bureaucracies to follow dumb rules, whereas we will presumably pay a lot of attention to how we give process-based feedback.