I think my threat model is a bit different. I don’t particularly care about the zillions of mediocre ML practitioners who follow things that are hot and/or immediately useful. I do care about the pioneers, who are way ahead of the curve, working to develop the next big idea in AI long before it arrives. These people are not only very insightful themselves, but also can recognize an important insight when they see it, and they’re out hunting for those insights, and they’re not looking in the same places as most people, and in particular they’re not looking at whatever is trending on Twitter or immediately useful.
Let’s try this analogy, maybe: “most impressive AI” ↔ “fastest man-made object”. Let’s say that the current record-holder for fastest man-made object is a train. And right now a competitor is building a better train, that uses new train-track technology. It’s all very exciting, and lots of people are following it in the newspapers. Meanwhile, a pioneer has the idea of building the first-ever rocket ship, but the pioneer is stuck because they need better heat-resistant tiles in order for the rocket-ship prototype to actually work. This pioneer is probably not going to be following the fastest-train news; instead, they’re going to be poring over the obscure literature on heat-resistant tiles. (Sorry for lack of historical or engineering accuracy in the above.) This isn’t a perfect analogy for many reasons, ignore it if you like.
So my ideal model is (1) figure out the whole R&D path(s) to building AGI, (2) don’t tell anyone (or even write it down!), (3) now you know exactly what not to publish, i.e. everything on that path, and it doesn’t matter whether those things would be immediately useful or not, because the pioneers who are already setting out down that path will seek out and find what you’re publishing, even if it’s obscure, because they already have a pretty good idea of what they’re looking for. Of course, that’s easier said than done, especially step (1) :-P
I do basically buy the “ignore the legions of mediocre ML practitioners, pay attention to the pioneers” model. That does match my modal expectations for how AGI gets built. But:
How do those pioneers find my work, if my work isn’t very memetically fit?
If they encounter my work directly from me (i.e. by reading it on LW), then at that point they’re selected pretty heavily for also finding lots of stuff about alignment.
There’s still the theory-practice gap; our hypothetical pioneer needs to somehow recognize the theory they need without flashy demos to prove its effectiveness.
Thinking about it, these factors are not enough to make me confident that someone won’t use my work to produce an unaligned AGI. On (1), thinking about my personal work, there’s just very little technical work at all on abstraction, so someone who knows to look for technical work on abstraction could very plausibly encounter mine. And that is indeed the sort of thing I’d expect an AGI pioneer to be looking for. On (2), they’d be a lot more likely to encounter my work if they’re already paying attention to alignment, and encountering my work would probably make them more likely to pay attention to alignment even if they weren’t before, but neither of those really rules out unaligned researchers. On (3), I do expect a pioneer to be able to recognize the theory they need without flashy demos to prove it if they spend a few days’ attention on it.
… ok, so I’m basically convinced that I should be thinking about this particular scenario, and the “nobody cares” defense is weaker against the hypothetical pioneers than against most people. I think the “fundamental difference” is that the hypothetical pioneers know what to look for; they’re not just relying on memetic fitness to bring the key ideas to them.
… well fuck, now I need to go propagate this update.
at that point they’re selected pretty heavily for also finding lots of stuff about alignment.
An annoying thing is, just as I sometimes read Yann LeCun or Steven Pinker or Jeff Hawkins, and I extract some bits of insight from them while ignoring all the stupid things they say about the alignment problem, by the same token I imagine other people might read my posts, and extract some bits of insight from me while ignoring all the wise things I say about the alignment problem. :-P
That said, I do definitely put some nonzero weight on those kinds of considerations. :)
On this model, I still basically don’t worry about spy agencies.
What do I really want here? Like, what would be the thing which I most want those pioneers find? “Nothing” doesn’t actually seem like the right answer; someone who already knows what to look for is going to figure out the key ideas with or without me. What I really want to do is to advertise to those pioneers, in the obscure work which most people will never pay attention to. I want to recruit them.
It feels like the prototypical person I’m defending against here is younger me. What would work on younger me?
I think my threat model is a bit different. I don’t particularly care about the zillions of mediocre ML practitioners who follow things that are hot and/or immediately useful. I do care about the pioneers, who are way ahead of the curve, working to develop the next big idea in AI long before it arrives. These people are not only very insightful themselves, but also can recognize an important insight when they see it, and they’re out hunting for those insights, and they’re not looking in the same places as most people, and in particular they’re not looking at whatever is trending on Twitter or immediately useful.
Let’s try this analogy, maybe: “most impressive AI” ↔ “fastest man-made object”. Let’s say that the current record-holder for fastest man-made object is a train. And right now a competitor is building a better train, that uses new train-track technology. It’s all very exciting, and lots of people are following it in the newspapers. Meanwhile, a pioneer has the idea of building the first-ever rocket ship, but the pioneer is stuck because they need better heat-resistant tiles in order for the rocket-ship prototype to actually work. This pioneer is probably not going to be following the fastest-train news; instead, they’re going to be poring over the obscure literature on heat-resistant tiles. (Sorry for lack of historical or engineering accuracy in the above.) This isn’t a perfect analogy for many reasons, ignore it if you like.
So my ideal model is (1) figure out the whole R&D path(s) to building AGI, (2) don’t tell anyone (or even write it down!), (3) now you know exactly what not to publish, i.e. everything on that path, and it doesn’t matter whether those things would be immediately useful or not, because the pioneers who are already setting out down that path will seek out and find what you’re publishing, even if it’s obscure, because they already have a pretty good idea of what they’re looking for. Of course, that’s easier said than done, especially step (1) :-P
Thinking out loud here...
I do basically buy the “ignore the legions of mediocre ML practitioners, pay attention to the pioneers” model. That does match my modal expectations for how AGI gets built. But:
How do those pioneers find my work, if my work isn’t very memetically fit?
If they encounter my work directly from me (i.e. by reading it on LW), then at that point they’re selected pretty heavily for also finding lots of stuff about alignment.
There’s still the theory-practice gap; our hypothetical pioneer needs to somehow recognize the theory they need without flashy demos to prove its effectiveness.
Thinking about it, these factors are not enough to make me confident that someone won’t use my work to produce an unaligned AGI. On (1), thinking about my personal work, there’s just very little technical work at all on abstraction, so someone who knows to look for technical work on abstraction could very plausibly encounter mine. And that is indeed the sort of thing I’d expect an AGI pioneer to be looking for. On (2), they’d be a lot more likely to encounter my work if they’re already paying attention to alignment, and encountering my work would probably make them more likely to pay attention to alignment even if they weren’t before, but neither of those really rules out unaligned researchers. On (3), I do expect a pioneer to be able to recognize the theory they need without flashy demos to prove it if they spend a few days’ attention on it.
… ok, so I’m basically convinced that I should be thinking about this particular scenario, and the “nobody cares” defense is weaker against the hypothetical pioneers than against most people. I think the “fundamental difference” is that the hypothetical pioneers know what to look for; they’re not just relying on memetic fitness to bring the key ideas to them.
… well fuck, now I need to go propagate this update.
An annoying thing is, just as I sometimes read Yann LeCun or Steven Pinker or Jeff Hawkins, and I extract some bits of insight from them while ignoring all the stupid things they say about the alignment problem, by the same token I imagine other people might read my posts, and extract some bits of insight from me while ignoring all the wise things I say about the alignment problem. :-P
That said, I do definitely put some nonzero weight on those kinds of considerations. :)
More thinking out loud...
On this model, I still basically don’t worry about spy agencies.
What do I really want here? Like, what would be the thing which I most want those pioneers find? “Nothing” doesn’t actually seem like the right answer; someone who already knows what to look for is going to figure out the key ideas with or without me. What I really want to do is to advertise to those pioneers, in the obscure work which most people will never pay attention to. I want to recruit them.
It feels like the prototypical person I’m defending against here is younger me. What would work on younger me?
Coming back to this 2 years later, and I’m curious about how you’ve changed your mind.