johnswentworth comments on Principles of Privacy for Alignment Research

johnswentworth 28 Jul 2022 0:20 UTC
LW: 25 AF: 15
4
AF
Thinking out loud here...
I do basically buy the “ignore the legions of mediocre ML practitioners, pay attention to the pioneers” model. That does match my modal expectations for how AGI gets built. But:
1. How do those pioneers find my work, if my work isn’t very memetically fit?
2. If they encounter my work directly from me (i.e. by reading it on LW), then at that point they’re selected pretty heavily for also finding lots of stuff about alignment.
3. There’s still the theory-practice gap; our hypothetical pioneer needs to somehow recognize the theory they need without flashy demos to prove its effectiveness.
Thinking about it, these factors are not enough to make me confident that someone won’t use my work to produce an unaligned AGI. On (1), thinking about my personal work, there’s just very little technical work at all on abstraction, so someone who knows to look for technical work on abstraction could very plausibly encounter mine. And that is indeed the sort of thing I’d expect an AGI pioneer to be looking for. On (2), they’d be a lot more likely to encounter my work if they’re already paying attention to alignment, and encountering my work would probably make them more likely to pay attention to alignment even if they weren’t before, but neither of those really rules out unaligned researchers. On (3), I do expect a pioneer to be able to recognize the theory they need without flashy demos to prove it if they spend a few days’ attention on it.
… ok, so I’m basically convinced that I should be thinking about this particular scenario, and the “nobody cares” defense is weaker against the hypothetical pioneers than against most people. I think the “fundamental difference” is that the hypothetical pioneers know what to look for; they’re not just relying on memetic fitness to bring the key ideas to them.
… well fuck, now I need to go propagate this update.
- Steven Byrnes 28 Jul 2022 12:40 UTC
  LW: 14 AF: 10
  5
  AF Parent
  
  at that point they’re selected pretty heavily for also finding lots of stuff about alignment.
  
  An annoying thing is, just as I sometimes read Yann LeCun or Steven Pinker or Jeff Hawkins, and I extract some bits of insight from them while ignoring all the stupid things they say about the alignment problem, by the same token I imagine other people might read my posts, and extract some bits of insight from me while ignoring all the wise things I say about the alignment problem. :-P
  
  That said, I do definitely put some nonzero weight on those kinds of considerations. :)
- johnswentworth 28 Jul 2022 0:31 UTC
  LW: 9 AF: 7
  6
  AF Parent
  More thinking out loud...
  - On this model, I still basically don’t worry about spy agencies.
  - What do I really want here? Like, what would be the thing which I most want those pioneers find? “Nothing” doesn’t actually seem like the right answer; someone who already knows what to look for is going to figure out the key ideas with or without me. What I really want to do is to advertise to those pioneers, in the obscure work which most people will never pay attention to. I want to recruit them.
  - It feels like the prototypical person I’m defending against here is younger me. What would work on younger me?
  - Garrett Baker 15 Jul 2024 17:28 UTC
    2 points
    0
    Parent
    Coming back to this 2 years later, and I’m curious about how you’ve changed your mind.