You might say “okay, sure, at some level of scaling GPTs learn enough general reasoning that they can manage a corporation, but there’s no reason to believe it’s near”.
Right. This is essentially the same way we might reply to Claude Shannon if he said that some level of brute-force search would solve the problem of natural language translation.
one of the major points of the bio anchors framework is to give a reasonable answer to the question of “at what level of scaling might this work”, so I don’t think you can argue that current forecasts are ignoring (2).
Figuring out how to make a model manage a corporation involves a lot more than scaling a model until it has the requisite general intelligence to do it in principle if its motivation were aligned.
I think it will be hard to figure out how to actually make models do stuff we want. Insofar as this is simply a restatement of the alignment problem, I think this assumption will be fairly uncontroversial around here. Yet, it’s also a reason to assume that we won’t simply obtain transformative models the moment they become theoretically attainable.
It might seem unfair that I’m inputting safety and control as an input in our model for timelines, if we’re using the model to reason about the optimal time to intervene. But I think on an individual level it makes sense to just try to forecast what will actually happen.
I think it will be hard to figure out how to actually make models do stuff we want. Insofar as this is simply a restatement of the alignment problem, I think this assumption will be fairly uncontroversial around here.
Fwiw, the problem I think is hard is “how to make models do stuff that is actually what we want, rather than only seeming like what we want, or only initially what we want until the model does something completely different like taking over the world”.
I don’t expect that it will be hard to get models that look like they’re doing roughly the thing we want; see e.g. the relative ease of prompt engineering or learning from human preferences. If I thought that were hard, I would agree with you.
I would guess that this is relatively uncontroversial as a view within this field? Not sure though.
(One of my initial critiques of bio anchors was that it didn’t take into account the cost of human feedback, except then I actually ran some back-of-the-envelope calculations and it turned out it was dwarfed by the cost of compute; maybe that’s your crux too?)
Sorry for replying to this comment 2 years late, but I wanted to discuss this part of your reasoning,
Fwiw, the problem I think is hard is “how to make models do stuff that is actually what we want, rather than only seeming like what we want, or only initially what we want until the model does something completely different like taking over the world”.
I think that’s what I meant when I said “I think it will be hard to figure out how to actually make models do stuff we want”. But more importantly, I think that’s how most people will in fact perceive what it means to get a model to “do what we want”.
Put another way, I don’t think people will actually start using AI CEOs just because we have a language model that acts like a CEO. Large corporations will likely wait until they’re very confident in its reliability, robustness, and alignment. (Although idk, maybe some eccentric investors will find the idea interesting, I just expect that most people will be highly skeptical without strong evidence that it’s actually better than a human.)
I think this point can be seen pretty easily in discussion of driverless cars. Regulators are quite skeptical of Tesla’s autopilot despite it seeming to do what we want in perhaps over 99% of situations.
If anything, I expect most people to be intuitively skeptical that AI is really “doing what we want” even in cases where it’s genuinely doing a better job than humans, and doesn’t merely appear that way on the surface. The reason is simple: we have vast amounts of informal data on the reliability of humans, but very little idea how reliable AI will be. That plausibly causes people to start with a skeptical outlook, and only accept AI in safety-critical domains when they’ve seen it accumulate a long track record of exceptional performance.
For these reasons, I don’t fully agree that “one of the major points of the bio anchors framework is to give a reasonable answer to the question of “at what level of scaling might this work”″. I mean, I agree that this was what the report was trying to answer, but I disagree that it answered the question of when we will accept and adopt AI for various crucial economic activities, even if such systems were capable of automating everything in principle.
At some specified point in the future, will people believe that AI CEOs can perform the CEO task as well as human CEOs if deployed?
At some specified point in the future, will AI CEOs be able to perform the CEO task as well as human CEOs if deployed?
(The key difference being that (1) is a statement about people’s beliefs about reality, while (2) is a statement about reality directly.)
(For all of this I’m assuming that an AI CEO that does the job of CEO well until the point that it executes a treacherous turn counts as “performing the CEO task well”.)
I’m very sympathetic to skepticism about question 1 on short timelines, and indeed as I mentioned I agree with your points (1) and (3) in the OP and they cause me to lengthen my timelines for TAI relative to bio anchors.
My understanding was that you are also skeptical about question 2 on short timelines, and that was what you were arguing with your point (2) on overestimating generality. That’s the part I disagree with. But your response is talking about things that other people will believe, rather than about reality; I already agree with you on that part.
I think I understand my confusion, at least a bit better than before. Here’s how I’d summarize what happened.
I had three arguments in this essay, which I thought of as roughly having the following form:
Deployment lag: after TAI is fully developed, how long will it take to become widely impactful?
Generality: how difficult is it to develop TAI fully, including making it robustly and reliably achieve what we want?
Regulation: how much will people’s reactions to and concerns about AI delay the arrival of fully developed TAI?
You said that (2) was already answered by the bio anchors model. I responded that bio anchors neglected how difficult it will be to develop AI safely. You replied that it will be easy make models to seemingly do what we want, but that the harder part will be making models that actually do what we want.
My reply was trying to say that the inherent difficulty of building TAI safely was inherently baked into (2) already. That might be a dubious reading of the actual textual argument for (2), but I think that interpretation is backed up by my initial reply to your comment.
The reason why I framed my later reply as being about perceptions was because I think the requisite capability level at which people begin to adopt TAI is an important point about how long timelines will be independent of (1) and (3). In other words, I was arguing that people’s perceptions of the capability of AI will cause them wait to adopt AI until it’s fully developed in the sense I described above; it won’t just delay the effects of TAI after it’s fully developed, or before then because of regulation.
Furthermore, I assumed that you were arguing something along the lines of “people will adopt AI once it’s capable of only seeming to do what we want”, which I’m skeptical of. Hence my reply to you.
My understanding was that you are also skeptical about question 2 on short timelines, and that was what you were arguing with your point (2) on overestimating generality.
Since for point 2 you said “I’m assuming that an AI CEO that does the job of CEO well until the point that it executes a treacherous turn”, I am not very skeptical of that right now. I think we could probably have AIs do something that looks very similar to what a CEO would do within, idk, maybe five years.
(Independently of all of this, I’ve updated towards medium rather than long timelines in the last two years, but mostly because of reflection on other questions, and because I was surprised by the rate of recent progress, rather than because I have fundamental doubts about the arguments I made here, especially (3), which I think is still underrated.
ETA: though also, if I wrote this essay today I would likely fully re-write section (2), since after re-reading it I now don’t agree with some of the things I said in it. Sorry if I was being misleading by downplaying how poor some of those points were.)
Worries about AI misalignment: the world will believe that AI alignment is hard, and so avoid deploying it until doing a lot of work to be confident in alignment.
Regulation: it takes time to comply with regulations
If that’s right, I broadly agree with all of these points :)
(I previously thought you were saying something very different with (2), since the text in the OP seems pretty different.)
I previously thought you were saying something very different with (2), since the text in the OP seems pretty different.
FWIW I don’t think you’re getting things wrong here. I also have simply changed some of my views in the meantime.
That said, I think what I was trying to accomplish with (2) was not that alignment would be hard per se, but that it would be hard to get an AI to do very high-skill tasks in general, which included aligning the model, since otherwise it’s not really “doing the task” (though as I said, I don’t currently stand by what I wrote in the OP, as-is).
Thanks for the useful comment.
Right. This is essentially the same way we might reply to Claude Shannon if he said that some level of brute-force search would solve the problem of natural language translation.
Figuring out how to make a model manage a corporation involves a lot more than scaling a model until it has the requisite general intelligence to do it in principle if its motivation were aligned.
I think it will be hard to figure out how to actually make models do stuff we want. Insofar as this is simply a restatement of the alignment problem, I think this assumption will be fairly uncontroversial around here. Yet, it’s also a reason to assume that we won’t simply obtain transformative models the moment they become theoretically attainable.
It might seem unfair that I’m inputting safety and control as an input in our model for timelines, if we’re using the model to reason about the optimal time to intervene. But I think on an individual level it makes sense to just try to forecast what will actually happen.
Fwiw, the problem I think is hard is “how to make models do stuff that is actually what we want, rather than only seeming like what we want, or only initially what we want until the model does something completely different like taking over the world”.
I don’t expect that it will be hard to get models that look like they’re doing roughly the thing we want; see e.g. the relative ease of prompt engineering or learning from human preferences. If I thought that were hard, I would agree with you.
I would guess that this is relatively uncontroversial as a view within this field? Not sure though.
(One of my initial critiques of bio anchors was that it didn’t take into account the cost of human feedback, except then I actually ran some back-of-the-envelope calculations and it turned out it was dwarfed by the cost of compute; maybe that’s your crux too?)
Sorry for replying to this comment 2 years late, but I wanted to discuss this part of your reasoning,
I think that’s what I meant when I said “I think it will be hard to figure out how to actually make models do stuff we want”. But more importantly, I think that’s how most people will in fact perceive what it means to get a model to “do what we want”.
Put another way, I don’t think people will actually start using AI CEOs just because we have a language model that acts like a CEO. Large corporations will likely wait until they’re very confident in its reliability, robustness, and alignment. (Although idk, maybe some eccentric investors will find the idea interesting, I just expect that most people will be highly skeptical without strong evidence that it’s actually better than a human.)
I think this point can be seen pretty easily in discussion of driverless cars. Regulators are quite skeptical of Tesla’s autopilot despite it seeming to do what we want in perhaps over 99% of situations.
If anything, I expect most people to be intuitively skeptical that AI is really “doing what we want” even in cases where it’s genuinely doing a better job than humans, and doesn’t merely appear that way on the surface. The reason is simple: we have vast amounts of informal data on the reliability of humans, but very little idea how reliable AI will be. That plausibly causes people to start with a skeptical outlook, and only accept AI in safety-critical domains when they’ve seen it accumulate a long track record of exceptional performance.
For these reasons, I don’t fully agree that “one of the major points of the bio anchors framework is to give a reasonable answer to the question of “at what level of scaling might this work”″. I mean, I agree that this was what the report was trying to answer, but I disagree that it answered the question of when we will accept and adopt AI for various crucial economic activities, even if such systems were capable of automating everything in principle.
I want to distinguish between two questions:
At some specified point in the future, will people believe that AI CEOs can perform the CEO task as well as human CEOs if deployed?
At some specified point in the future, will AI CEOs be able to perform the CEO task as well as human CEOs if deployed?
(The key difference being that (1) is a statement about people’s beliefs about reality, while (2) is a statement about reality directly.)
(For all of this I’m assuming that an AI CEO that does the job of CEO well until the point that it executes a treacherous turn counts as “performing the CEO task well”.)
I’m very sympathetic to skepticism about question 1 on short timelines, and indeed as I mentioned I agree with your points (1) and (3) in the OP and they cause me to lengthen my timelines for TAI relative to bio anchors.
My understanding was that you are also skeptical about question 2 on short timelines, and that was what you were arguing with your point (2) on overestimating generality. That’s the part I disagree with. But your response is talking about things that other people will believe, rather than about reality; I already agree with you on that part.
I think I understand my confusion, at least a bit better than before. Here’s how I’d summarize what happened.
I had three arguments in this essay, which I thought of as roughly having the following form:
Deployment lag: after TAI is fully developed, how long will it take to become widely impactful?
Generality: how difficult is it to develop TAI fully, including making it robustly and reliably achieve what we want?
Regulation: how much will people’s reactions to and concerns about AI delay the arrival of fully developed TAI?
You said that (2) was already answered by the bio anchors model. I responded that bio anchors neglected how difficult it will be to develop AI safely. You replied that it will be easy make models to seemingly do what we want, but that the harder part will be making models that actually do what we want.
My reply was trying to say that the inherent difficulty of building TAI safely was inherently baked into (2) already. That might be a dubious reading of the actual textual argument for (2), but I think that interpretation is backed up by my initial reply to your comment.
The reason why I framed my later reply as being about perceptions was because I think the requisite capability level at which people begin to adopt TAI is an important point about how long timelines will be independent of (1) and (3). In other words, I was arguing that people’s perceptions of the capability of AI will cause them wait to adopt AI until it’s fully developed in the sense I described above; it won’t just delay the effects of TAI after it’s fully developed, or before then because of regulation.
Furthermore, I assumed that you were arguing something along the lines of “people will adopt AI once it’s capable of only seeming to do what we want”, which I’m skeptical of. Hence my reply to you.
Since for point 2 you said “I’m assuming that an AI CEO that does the job of CEO well until the point that it executes a treacherous turn”, I am not very skeptical of that right now. I think we could probably have AIs do something that looks very similar to what a CEO would do within, idk, maybe five years.
(Independently of all of this, I’ve updated towards medium rather than long timelines in the last two years, but mostly because of reflection on other questions, and because I was surprised by the rate of recent progress, rather than because I have fundamental doubts about the arguments I made here, especially (3), which I think is still underrated.
ETA: though also, if I wrote this essay today I would likely fully re-write section (2), since after re-reading it I now don’t agree with some of the things I said in it. Sorry if I was being misleading by downplaying how poor some of those points were.)
My summary of your argument now would be:
Deployment lag: it takes time to deploy stuff
Worries about AI misalignment: the world will believe that AI alignment is hard, and so avoid deploying it until doing a lot of work to be confident in alignment.
Regulation: it takes time to comply with regulations
If that’s right, I broadly agree with all of these points :)
(I previously thought you were saying something very different with (2), since the text in the OP seems pretty different.)
FWIW I don’t think you’re getting things wrong here. I also have simply changed some of my views in the meantime.
That said, I think what I was trying to accomplish with (2) was not that alignment would be hard per se, but that it would be hard to get an AI to do very high-skill tasks in general, which included aligning the model, since otherwise it’s not really “doing the task” (though as I said, I don’t currently stand by what I wrote in the OP, as-is).