Earlier today, I was preparing for an interview. I warmed up by replying stream-of-consciousness to imaginary questions I thought they might ask. Seemed worth putting here.
What do you think about AI timelines?
I’ve obviously got a lot of uncertainty. I’ve got a bimodal distribution, binning into “DL is basically sufficient and we need at most 1 big new insight to get to AGI” and “we need more than 1 big insight”
So the first bin has most of the probability in the 10-20 years from now, and the second is more like 45-80 years, with positive skew.
Some things driving my uncertainty are, well, a lot. One thing that drives how things turn out (but not really how fast we’ll get there) is: will we be able to tell we’re close 3+ years in advance, and if so, how quickly will the labs react? Gwern Branwen made a point a few months ago, which is like, OAI has really been validated on this scaling hypothesis, and no one else is really betting big on it because they’re stubborn/incentives/etc, despite the amazing progress from scaling. If that’s true, then even if it’s getting pretty clear that one approach is working better, we might see a slower pivot and have a more unipolar scenario.
I feel dissatisfied with pontificating like this, though, because there are so many considerations pulling so many different ways. I think one of the best things we can do right now is to identify key considerations. There was work on expert models that showed that training simple featurized linear models often beat domain experts, quite soundly. It turned out that most of the work the experts did was locating the right features, and not necessarily assigning very good weights to those features.
So one key consideration I recently read, IMO, was Evan Hubinger talking about how homogeneity of AI systems: if they’re all pretty similarly structured, they’re plausibly roughly equally aligned, which would really decrease the probability of aligned vs unaligned AGIs duking it out.
What do you think the alignment community is getting wrong?
When I started thinking about alignment, I had this deep respect for everything ever written, like I thought the people were so smart (which they generally are) and the content was polished and thoroughly viewed through many different frames (which it wasn’t/isn’t). I think the field is still young enough that: in our research, we should be executing higher-variance cognitive moves, trying things and breaking things and coming up with new frames. Think about ideas from new perspectives.
I think right now, a lot of people are really optimizing for legibility and defensibility. I think I do that more than I want/should. Usually the “non-defensibility” stage lasts the first 1-2 months on a new paper, and then you have to defend thoughts. This can make sense for individuals, and it should be short some of the time, but as a population I wish defensibility weren’t as big of a deal for people / me. MIRI might be better at avoiding this issue, but a not-really-defensible intuition I have is that they’re freer in thought, but within the MIRI paradigm, if that makes sense. Maybe that opinion would change if I talked with them more.
Anyways, I think many of the people who do the best work aren’t optimizing for this.
Earlier today, I was preparing for an interview. I warmed up by replying stream-of-consciousness to imaginary questions I thought they might ask. Seemed worth putting here.