Rohin Shah comments on Rant on Problem Factorization for Alignment

Rohin Shah 17 Aug 2022 21:17 UTC
LW: 2 AF: 2
0
AF
Hmm, maybe? There are a few ways this could go:
1. We give feedback to the model on its reasoning, that feedback is bad in the same way that “the rest of the world pays attention and forces dumb rules on them” is bad
2. “Keep your reasoning transparent” is itself a dumb rule that we force upon the AI system that leads to terrible bureaucracy problems
I’m unsure about (2) and mostly disagree with (1) (and I think you were mostly saying (2)).
Disagreement with (1): Seems like the disanalogy relies pretty hard on the rest of the world not paying much attention when they force bureaucracies to follow dumb rules, whereas we will presumably pay a lot of attention to how we give process-based feedback.