Matthew Barnett comments on [missing post]

Matthew Barnett 2 Feb 2023 19:14 UTC
LW: 4 AF: 3
AF
I think I understand my confusion, at least a bit better than before. Here’s how I’d summarize what happened.
I had three arguments in this essay, which I thought of as roughly having the following form:
1. Deployment lag: after TAI is fully developed, how long will it take to become widely impactful?
2. Generality: how difficult is it to develop TAI fully, including making it robustly and reliably achieve what we want?
3. Regulation: how much will people’s reactions to and concerns about AI delay the arrival of fully developed TAI?
You said that (2) was already answered by the bio anchors model. I responded that bio anchors neglected how difficult it will be to develop AI safely. You replied that it will be easy make models to seemingly do what we want, but that the harder part will be making models that actually do what we want.
My reply was trying to say that the inherent difficulty of building TAI safely was inherently baked into (2) already. That might be a dubious reading of the actual textual argument for (2), but I think that interpretation is backed up by my initial reply to your comment.
The reason why I framed my later reply as being about perceptions was because I think the requisite capability level at which people begin to adopt TAI is an important point about how long timelines will be independent of (1) and (3). In other words, I was arguing that people’s perceptions of the capability of AI will cause them wait to adopt AI until it’s fully developed in the sense I described above; it won’t just delay the effects of TAI after it’s fully developed, or before then because of regulation.
Furthermore, I assumed that you were arguing something along the lines of “people will adopt AI once it’s capable of only seeming to do what we want”, which I’m skeptical of. Hence my reply to you.
My understanding was that you are also skeptical about question 2 on short timelines, and that was what you were arguing with your point (2) on overestimating generality.
Since for point 2 you said “I’m assuming that an AI CEO that does the job of CEO well until the point that it executes a treacherous turn”, I am not very skeptical of that right now. I think we could probably have AIs do something that looks very similar to what a CEO would do within, idk, maybe five years.
(Independently of all of this, I’ve updated towards medium rather than long timelines in the last two years, but mostly because of reflection on other questions, and because I was surprised by the rate of recent progress, rather than because I have fundamental doubts about the arguments I made here, especially (3), which I think is still underrated.
ETA: though also, if I wrote this essay today I would likely fully re-write section (2), since after re-reading it I now don’t agree with some of the things I said in it. Sorry if I was being misleading by downplaying how poor some of those points were.)
- Rohin Shah 4 Feb 2023 8:57 UTC
  LW: 4 AF: 3
  AF Parent
  My summary of your argument now would be:
  1. Deployment lag: it takes time to deploy stuff
  2. Worries about AI misalignment: the world will believe that AI alignment is hard, and so avoid deploying it until doing a lot of work to be confident in alignment.
  3. Regulation: it takes time to comply with regulations
  If that’s right, I broadly agree with all of these points :)
  (I previously thought you were saying something very different with (2), since the text in the OP seems pretty different.)
  - Matthew Barnett 4 Feb 2023 11:37 UTC
    LW: 4 AF: 3
    AF Parent
    
    I previously thought you were saying something very different with (2), since the text in the OP seems pretty different.
    
    FWIW I don’t think you’re getting things wrong here. I also have simply changed some of my views in the meantime.
    
    That said, I think what I was trying to accomplish with (2) was not that alignment would be hard per se, but that it would be hard to get an AI to do very high-skill tasks in general, which included aligning the model, since otherwise it’s not really “doing the task” (though as I said, I don’t currently stand by what I wrote in the OP, as-is).