We give feedback to the model on its reasoning, that feedback is bad in the same way that “the rest of the world pays attention and forces dumb rules on them” is bad
“Keep your reasoning transparent” is itself a dumb rule that we force upon the AI system that leads to terrible bureaucracy problems
I’m unsure about (2) and mostly disagree with (1) (and I think you were mostly saying (2)).
Disagreement with (1): Seems like the disanalogy relies pretty hard on the rest of the world not paying much attention when they force bureaucracies to follow dumb rules, whereas we will presumably pay a lot of attention to how we give process-based feedback.
I think a lot of alignment tax-imposing interventions (like requiring local work to be transparent for process-based feedback) could be analogous?
Hmm, maybe? There are a few ways this could go:
We give feedback to the model on its reasoning, that feedback is bad in the same way that “the rest of the world pays attention and forces dumb rules on them” is bad
“Keep your reasoning transparent” is itself a dumb rule that we force upon the AI system that leads to terrible bureaucracy problems
I’m unsure about (2) and mostly disagree with (1) (and I think you were mostly saying (2)).
Disagreement with (1): Seems like the disanalogy relies pretty hard on the rest of the world not paying much attention when they force bureaucracies to follow dumb rules, whereas we will presumably pay a lot of attention to how we give process-based feedback.