Rohin Shah comments on Rant on Problem Factorization for Alignment

Rohin Shah Aug 7, 2022, 6:45 PM
LW: 8 AF: 6
2
AF
One more disanalogy:
4. the rest of the world pays attention to large or powerful real-world bureaucracies and force rules on them that small teams / individuals can ignore (e.g. Secret Congress, Copenhagen interpretation of ethics, startups being able to do illegal stuff), but this presumably won’t apply to alignment approaches.
One other thing I should have mentioned is that I do think the “unconscious economics” point is relevant and could end up being a major problem for problem factorization, but I don’t think we have great real-world evidence suggesting that unconscious economics by itself is enough to make teams of agents not be worthwhile.
Re disanalogy 1: I’m not entirely sure I understand what your objection is here but I’ll try responding anyway.
I’m imagining that the base agent is an AI system that is pursuing a desired task with roughly human-level competence, not something that acts the way a whole-brain emulation in a realistic environment would act. This base agent can be trained by imitation learning where you have the AI system mimic human demonstrations of the task, or by reinforcement learning on a reward model trained off of human preferences, but (we hope) is just trying to do the task and doesn’t have all the other human wants and desires. (Yes, this leaves a question of how you get that in the first place; personally I think that this distillation is the “hard part”, but that seems separate from the bureaucracy point.)
Even if you did get a bureaucracy made out of agents with human desires, it still seems like you get a lot of benefit from the fact that the agents are identical to each other, and so have less politics.
Re disanalogy 3: I agree that you have to think that a small / medium / large bureaucracy of Alices-with-15-minutes will at least slightly outperform an individual / small / medium bureaucracy of Alices-with-15-minutes before this disanalogy is actually a reason for optimism. I think that ends up coming from disanalogies 1, 2 and 4, plus some difference in opinion about real-world bureaucracies, e.g. I feel pretty good about small real-world teams beating individuals.
I mostly mention this disanalogy as a reason not to update too hard on intuitions like Can HCH epistemically dominate Ramanujan? and this SlateStarCodex post.
On reflection I think there’s a strong chance you have tried picturing that, but I’m not confident, so I mention it just in case you haven’t.
Yeah I have. Personally my inner sim feels pretty great about the combination of disanalogy 1 and disanalogy 2 -- it feels like a coalition of Rohins would do so much better than an individual Rohin, as long as the Rohins had time to get familiar with a protocol and evolve it to suit their needs. (Picturing some giant number of Rohins a la disanalogy 3 is a lot harder to do but when I try it mostly feels like it probably goes fine.)
- CarlShulman Aug 8, 2022, 7:12 PM
  LW: 7 AF: 6
  0
  AF Parent
  4. the rest of the world pays attention to large or powerful real-world bureaucracies and force rules on them that small teams / individuals can ignore (e.g. Secret Congress, Copenhagen interpretation of ethics, startups being able to do illegal stuff), but this presumably won’t apply to alignment approaches.
  
  I think a lot of alignment tax-imposing interventions (like requiring local work to be transparent for process-based feedback) could be analogous?
  - Rohin Shah Aug 17, 2022, 9:17 PM
    LW: 2 AF: 2
    0
    AF Parent
    Hmm, maybe? There are a few ways this could go:
    We give feedback to the model on its reasoning, that feedback is bad in the same way that “the rest of the world pays attention and forces dumb rules on them” is bad
    “Keep your reasoning transparent” is itself a dumb rule that we force upon the AI system that leads to terrible bureaucracy problems
    I’m unsure about (2) and mostly disagree with (1) (and I think you were mostly saying (2)).
    Disagreement with (1): Seems like the disanalogy relies pretty hard on the rest of the world not paying much attention when they force bureaucracies to follow dumb rules, whereas we will presumably pay a lot of attention to how we give process-based feedback.
- johnswentworth Aug 8, 2022, 1:10 AM
  LW: 4 AF: 4
  0
  AF Parent
  Re disanalogy 1: I’m not entirely sure I understand what your objection is here but I’ll try responding anyway.
  I was mostly thinking of the unconscious economics stuff.
  Personally my inner sim feels pretty great about the combination of disanalogy 1 and disanalogy 2 -- it feels like a coalition of Rohins would do so much better than an individual Rohin, as long as the Rohins had time to get familiar with a protocol and evolve it to suit their needs. (Picturing some giant number of Rohins a la disanalogy 3 is a lot harder to do but when I try it mostly feels like it probably goes fine.)
  I should have asked for a mental picture sooner, this is very useful to know. Thanks.
  If I imagine a bunch of Johns, I think that they basically do fine, though mainly because they just don’t end up using very many Johns. I do think a small team of Johns would do way better than I do.