<unfair rant with the goal of shaking people out of a mindset>
To all of you telling me or expecting me to update to shorter timelines given <new AI result>: have you ever encountered Bayesianism?
Surely if you did, you’d immediately reason that you couldn’t know how I would update, without first knowing what I expected to see in advance. Which you very clearly don’t know. How on earth could you know which way I should update upon observing this new evidence? In fact, why do you even care about which direction I update? That too shouldn’t give you much evidence if you don’t know what I expected in the first place.
Maybe I should feel insulted? That you think so poorly of my reasoning ability that I should be updating towards shorter timelines every time some new advance in AI comes out, as though I hadn’t already priced that into my timeline estimates, and so would predictably update towards shorter timelines in violation of conservation of expected evidence? But that only follows if I expect you to be a good reasoner modeling me as a bad reasoner, which probably isn’t what’s going on.
</unfair rant>
My actual guess is that people notice a discrepancy between their very-short timelines and my somewhat-short timelines, and then they want to figure out what causes this discrepancy, and an easily-available question is “why doesn’t X imply short timelines” and then for some reason that I still don’t understand they instead substitute the much worse question of “why didn’t you update towards short timelines on X” without noticing its major flaws.
Fwiw, I was extremely surprised by OpenAI Five working with just vanilla PPO (with reward shaping and domain randomization), rather than requiring any advances in hierarchical RL. I made one massive update then (in the sense that I immediately started searching for a new model that explained that result; it did take over a year to get to a model I actually liked). I also basically adopted the bio anchors timelines when that report was released (primarily because it agreed with my model, elaborated on it, and then actually calculated out its consequences, which I had never done because it’s actually quite a lot of work). Apart from those two instances I don’t think I’ve had major timeline updates.
I think it’s possible some people are asking these questions disrespectfully, but re: bio anchors, I do think that the report makes a series of assumptions whose plausibility can change over time, and thus your timelines can shift as you reweight different bio anchors scenarios while still believing in bio anchors.
To me, the key update on bio anchors seems like I no longer believe the preemptive update against the human lifetime anchor. It was justified largely on the grounds of “someone could’ve done it already” and “ML is very sample inefficient”, but it seems like those should be reevaluated given that as we get closer systems like PaLM exhibit capabilities remarkable enough that I’m not sold that a different training setup couldn’t be doing really good RL with the same data/compute implying that the bottleneck could just be algorithmic progress, and separately that few-shot learning is now much more common than the many-shot learning of prior ML progress.
I still think that the “number of RL episodes lasting Y seconds with the agent using X flop/s” anchor is a separate good one, and while I’m now much less convinced we’ll need the 1e16 flop/s models estimated in bio-anchors (and separately Chinchilla scaling laws + conservation of expected evidence about more improvements also weren’t incorporated into the exponent and should probably shift it down) I think the NN anchors still have predictive value and slightly lengthen timelines.
Also, though, insofar as people are asking you to update on Gato, I agree that makes little sense.
I agree your timelines can and should shift based on evidence even if you continue to believe in the bio anchors framework.
Personally, I completely ignore the genome anchor, and I don’t buy the lifetime anchor or the evolution anchor very much (I think the structure of the neural net anchors is a lot better and more likely to give the right answer).
Animals with smaller brains (like bees) are capable of few-shot learning, so I’m not really sure why observing few-shot learning is much of an update. See e.g. this post.
Essentially, the problem is that ‘evidence that shifts Bio Anchors weightings’ is quite different, more restricted, and much harder to define than the straightforward ‘evidence of impressive capabilities’. However, the reason that I think it’s worth checking if new results are updates is that some impressive capabilities might be ones that shift bio anchors weightings. But impressiveness by itself tells you very little.
I think a lot of people with very short timelines are imagining the only possible alternative view as being ‘another AI winter, scaling laws bend, and we don’t get excellent human-level performance on short term language-specified tasks anytime soon’, and don’t see the further question of figuring out exactly what human-level on e.g. MMLI would imply.
This is because the alternative to very short timelines from (your weightings on) Bio Anchors isn’t another AI winter, rather it’s that we do get all those short-term capabilities soon, but have to wait a while longer to crack long-term agentic planning because that doesn’t come “for free” from competence on short-term tasks, if you’re as sample-inefficient as current ML is.
So what we’re really looking for isn’t systems getting progressively better and better at short-horizon language tasks. That’s something that either the lifetime-anchor Bio Anchors view or the original Bio Anchors view predicts, and we need something that discriminates between the two.
We have some (indirect) evidence that original bio anchors is right: namely that it being wrong implies evolution missed an obvious open goal to make bees and mice generally intelligent long term planners, and that human beings generally aren’t vastly better than evolution at designing things anyway, and the lifetime anchor would imply that AGI is a glaring exception to this general trend.
As evidence, this has the advantage of being about something that really happened: human beings are the only human-level general intelligence that exists so far, so we have very good reasons to think matching the human brain is sufficient. However, it has the disadvantage of all the usual disanalogies between evolution and its requirements, and human designers and our requirements. Maybe this just is one of those situations where we can outdo evolution: that’s not especially unlikely.
What’s the evidence on the other side (i.e. against original bio anchors and for the lifetime anchor)?
There are two kinds that I tend to hear. One is that short-horizon competence is enough for dangerous/transformative capabilities. E.g. the claim that if you can build something that’s “human level/superhuman at charisma/persuasion/propaganda/manipulation, at least on short timescales” that represents a gigantic existential risk factor that condemns us to disaster further down the line (the AI PONR idea), or that at this point actors with bad incentives will be far too influential/wealthy/advancing the SOTA in AI.
However, I’d consider this changing the subject: essentially it’s not an argument for AGI takeover soon, rather it’s an argument for ‘certain narrow AIs are far more dangerous than you realize’. That means you have to go all the way back to the start and argue for why such things would be catastrophic in the first place. We can’t rely on the simple “it’ll be superintelligent and seize a DSA”.
Suppose we get such narrow AIs, that can do most short-term tasks for which there’s data, but don’t generalize to long horizons consistently. This scenario 10 years from now looks something like: AI automates away lots of jobs, can do certain kinds of short-term persuasion and manipulation, can speed up capabilities and alignment research, but not fully replace human researchers. Some of these AIs are agentic and possibly also misaligned (in ways that are detectable and fall far short of the ability to take over, since by assumption they aren’t competitive with humans at long-term planning). This certainly seems wild and full of potential danger, where slowing down progress could be much harder. It also looks like a scenario with far more attention on AI alignment than today, where the current funders of alignment research are much wealthier than now, and with plenty of obvious examples of what the problem is to catch people’s attention. Overall, it doesn’t seem like a scenario where (current AI alignment researchers + whoever else is working on it in 10 years) have considerably less leverage over the future than now: it could easily be more.
The other reason for favouring the lifetime anchor is you get long-horizon competence for free once you’re excellent at (a given list of) short-horizon tasks. This is arguing, more or less, that for the tasks that matter, current architectures are brainlike in their efficiency, such that the lifetime anchor makes more sense. A lot of the arguments in favour of this have a structure roughly like: look at a wide-ranging comprehension benchmark like MMLI—when an AI is human level on all of this, it’ll be able to keep a train of thought running continuously, keep a working memory and plan over very long timescales the same way humans do.
As evidence, this has the significant advantage of being relevant and not having to deal with the vagaries of what tradeoffs evolution may have made differently to human engineers. It has the disadvantage of being fiction. Or at least evidence that’s not yet been observed. You see AIs getting more and more impressive at a wider range of short-horizon tasks, which is roughly compatible with either view, but you don’t observe the described outcome of them generalizing out to much longer-term tasks than that.
So, to return to the original question, what would count as (additional) evidence in favour of the lifetime anchor? The answer clearly can’t be “nothing”, since if we build AGI in 5 years, that counts.
I think the answer is, anything that looks like unexpectedly cheap, easy, ‘for free’ generalization from relatively shorter to relatively longer horizon tasks (e.g. from single reasoning steps to many reasoning steps) without much fine-tuning.
This unexpected evidence is very tricky to operationalize. Default bio anchors assumes we’ll see a certain degree of generalizing from shorter to longer horizon tasks, and that we’ll see AI get better and better sample-efficiency on few-shot tasks, since it assumes that in 20 or so years we’ll get enough of such generalization to get AGI. I guess we just need to look for ‘more of it than we expected to see’?
That seems very hard to judge, since you can’t read off predictions about subhuman capabilities from bio anchors like that.
when an AI is human level on all of this, it’ll be able to keep a train of thought running continuously.
It does not seem to me like “can keep a train of thought running” implies “can take over the world” (or even “is comparable to a human”). I guess the idea is that with a train of thought you can do amplification? I’d be pretty surprised if train-of-thought-amplification on models of today (or 5 years from now) led to novel high quality scientific papers, even in fields that don’t require real-world experimentation.
I think this is the best writeup about this I’ve seen, and I agree with the main points, so kudos!
I do think that evidence of increasing returns to scale of multi-step chain of thought prompting are another weak datapoint in favor of the human lifetime anchor.
I also think there are pretty reasonable arguments that NNs may be more efficient than the human brain at converting flops to capabilities, e.g. if SGD is a better version of the best algorithm that can be implemented on biological hardware. Similarly, humans are exposed to a much smaller diversity of data than LMs (the internet is big and weird), and thus they may get more “novelty” per flop and thus generalize better from less data. My main point here is just that “biology is optimal” isn’t as strong a rejoinder when we’re comparing a process so different from what biology did.
<unfair rant with the goal of shaking people out of a mindset>
To all of you telling me or expecting me to update to shorter timelines given <new AI result>: have you ever encountered Bayesianism?
Surely if you did, you’d immediately reason that you couldn’t know how I would update, without first knowing what I expected to see in advance. Which you very clearly don’t know. How on earth could you know which way I should update upon observing this new evidence? In fact, why do you even care about which direction I update? That too shouldn’t give you much evidence if you don’t know what I expected in the first place.
Maybe I should feel insulted? That you think so poorly of my reasoning ability that I should be updating towards shorter timelines every time some new advance in AI comes out, as though I hadn’t already priced that into my timeline estimates, and so would predictably update towards shorter timelines in violation of conservation of expected evidence? But that only follows if I expect you to be a good reasoner modeling me as a bad reasoner, which probably isn’t what’s going on.
</unfair rant>
My actual guess is that people notice a discrepancy between their very-short timelines and my somewhat-short timelines, and then they want to figure out what causes this discrepancy, and an easily-available question is “why doesn’t X imply short timelines” and then for some reason that I still don’t understand they instead substitute the much worse question of “why didn’t you update towards short timelines on X” without noticing its major flaws.
Fwiw, I was extremely surprised by OpenAI Five working with just vanilla PPO (with reward shaping and domain randomization), rather than requiring any advances in hierarchical RL. I made one massive update then (in the sense that I immediately started searching for a new model that explained that result; it did take over a year to get to a model I actually liked). I also basically adopted the bio anchors timelines when that report was released (primarily because it agreed with my model, elaborated on it, and then actually calculated out its consequences, which I had never done because it’s actually quite a lot of work). Apart from those two instances I don’t think I’ve had major timeline updates.
I think it’s possible some people are asking these questions disrespectfully, but re: bio anchors, I do think that the report makes a series of assumptions whose plausibility can change over time, and thus your timelines can shift as you reweight different bio anchors scenarios while still believing in bio anchors.
To me, the key update on bio anchors seems like I no longer believe the preemptive update against the human lifetime anchor. It was justified largely on the grounds of “someone could’ve done it already” and “ML is very sample inefficient”, but it seems like those should be reevaluated given that as we get closer systems like PaLM exhibit capabilities remarkable enough that I’m not sold that a different training setup couldn’t be doing really good RL with the same data/compute implying that the bottleneck could just be algorithmic progress, and separately that few-shot learning is now much more common than the many-shot learning of prior ML progress.
I still think that the “number of RL episodes lasting Y seconds with the agent using X flop/s” anchor is a separate good one, and while I’m now much less convinced we’ll need the 1e16 flop/s models estimated in bio-anchors (and separately Chinchilla scaling laws + conservation of expected evidence about more improvements also weren’t incorporated into the exponent and should probably shift it down) I think the NN anchors still have predictive value and slightly lengthen timelines.
Also, though, insofar as people are asking you to update on Gato, I agree that makes little sense.
I agree your timelines can and should shift based on evidence even if you continue to believe in the bio anchors framework.
Personally, I completely ignore the genome anchor, and I don’t buy the lifetime anchor or the evolution anchor very much (I think the structure of the neural net anchors is a lot better and more likely to give the right answer).
Animals with smaller brains (like bees) are capable of few-shot learning, so I’m not really sure why observing few-shot learning is much of an update. See e.g. this post.
Essentially, the problem is that ‘evidence that shifts Bio Anchors weightings’ is quite different, more restricted, and much harder to define than the straightforward ‘evidence of impressive capabilities’. However, the reason that I think it’s worth checking if new results are updates is that some impressive capabilities might be ones that shift bio anchors weightings. But impressiveness by itself tells you very little.
I think a lot of people with very short timelines are imagining the only possible alternative view as being ‘another AI winter, scaling laws bend, and we don’t get excellent human-level performance on short term language-specified tasks anytime soon’, and don’t see the further question of figuring out exactly what human-level on e.g. MMLI would imply.
This is because the alternative to very short timelines from (your weightings on) Bio Anchors isn’t another AI winter, rather it’s that we do get all those short-term capabilities soon, but have to wait a while longer to crack long-term agentic planning because that doesn’t come “for free” from competence on short-term tasks, if you’re as sample-inefficient as current ML is.
So what we’re really looking for isn’t systems getting progressively better and better at short-horizon language tasks. That’s something that either the lifetime-anchor Bio Anchors view or the original Bio Anchors view predicts, and we need something that discriminates between the two.
We have some (indirect) evidence that original bio anchors is right: namely that it being wrong implies evolution missed an obvious open goal to make bees and mice generally intelligent long term planners, and that human beings generally aren’t vastly better than evolution at designing things anyway, and the lifetime anchor would imply that AGI is a glaring exception to this general trend.
As evidence, this has the advantage of being about something that really happened: human beings are the only human-level general intelligence that exists so far, so we have very good reasons to think matching the human brain is sufficient. However, it has the disadvantage of all the usual disanalogies between evolution and its requirements, and human designers and our requirements. Maybe this just is one of those situations where we can outdo evolution: that’s not especially unlikely.
What’s the evidence on the other side (i.e. against original bio anchors and for the lifetime anchor)?
There are two kinds that I tend to hear. One is that short-horizon competence is enough for dangerous/transformative capabilities. E.g. the claim that if you can build something that’s “human level/superhuman at charisma/persuasion/propaganda/manipulation, at least on short timescales” that represents a gigantic existential risk factor that condemns us to disaster further down the line (the AI PONR idea), or that at this point actors with bad incentives will be far too influential/wealthy/advancing the SOTA in AI.
However, I’d consider this changing the subject: essentially it’s not an argument for AGI takeover soon, rather it’s an argument for ‘certain narrow AIs are far more dangerous than you realize’. That means you have to go all the way back to the start and argue for why such things would be catastrophic in the first place. We can’t rely on the simple “it’ll be superintelligent and seize a DSA”.
Suppose we get such narrow AIs, that can do most short-term tasks for which there’s data, but don’t generalize to long horizons consistently. This scenario 10 years from now looks something like: AI automates away lots of jobs, can do certain kinds of short-term persuasion and manipulation, can speed up capabilities and alignment research, but not fully replace human researchers. Some of these AIs are agentic and possibly also misaligned (in ways that are detectable and fall far short of the ability to take over, since by assumption they aren’t competitive with humans at long-term planning). This certainly seems wild and full of potential danger, where slowing down progress could be much harder. It also looks like a scenario with far more attention on AI alignment than today, where the current funders of alignment research are much wealthier than now, and with plenty of obvious examples of what the problem is to catch people’s attention. Overall, it doesn’t seem like a scenario where (current AI alignment researchers + whoever else is working on it in 10 years) have considerably less leverage over the future than now: it could easily be more.
The other reason for favouring the lifetime anchor is you get long-horizon competence for free once you’re excellent at (a given list of) short-horizon tasks. This is arguing, more or less, that for the tasks that matter, current architectures are brainlike in their efficiency, such that the lifetime anchor makes more sense. A lot of the arguments in favour of this have a structure roughly like: look at a wide-ranging comprehension benchmark like MMLI—when an AI is human level on all of this, it’ll be able to keep a train of thought running continuously, keep a working memory and plan over very long timescales the same way humans do.
As evidence, this has the significant advantage of being relevant and not having to deal with the vagaries of what tradeoffs evolution may have made differently to human engineers. It has the disadvantage of being fiction. Or at least evidence that’s not yet been observed. You see AIs getting more and more impressive at a wider range of short-horizon tasks, which is roughly compatible with either view, but you don’t observe the described outcome of them generalizing out to much longer-term tasks than that.
So, to return to the original question, what would count as (additional) evidence in favour of the lifetime anchor? The answer clearly can’t be “nothing”, since if we build AGI in 5 years, that counts.
I think the answer is, anything that looks like unexpectedly cheap, easy, ‘for free’ generalization from relatively shorter to relatively longer horizon tasks (e.g. from single reasoning steps to many reasoning steps) without much fine-tuning.
This is different from many of the other signs of impressiveness we’ve seen recently: just learning lots of shorter-horizon tasks without much transfer between them, being able to point models successfully at particular short-horizon tasks with good prompting, getting much better at a wider range of tasks that can only be done over short horizons. All of these are expected on either view.
This unexpected evidence is very tricky to operationalize. Default bio anchors assumes we’ll see a certain degree of generalizing from shorter to longer horizon tasks, and that we’ll see AI get better and better sample-efficiency on few-shot tasks, since it assumes that in 20 or so years we’ll get enough of such generalization to get AGI. I guess we just need to look for ‘more of it than we expected to see’?
That seems very hard to judge, since you can’t read off predictions about subhuman capabilities from bio anchors like that.
Yeah, this all seems right to me.
It does not seem to me like “can keep a train of thought running” implies “can take over the world” (or even “is comparable to a human”). I guess the idea is that with a train of thought you can do amplification? I’d be pretty surprised if train-of-thought-amplification on models of today (or 5 years from now) led to novel high quality scientific papers, even in fields that don’t require real-world experimentation.
I think this is the best writeup about this I’ve seen, and I agree with the main points, so kudos!
I do think that evidence of increasing returns to scale of multi-step chain of thought prompting are another weak datapoint in favor of the human lifetime anchor.
I also think there are pretty reasonable arguments that NNs may be more efficient than the human brain at converting flops to capabilities, e.g. if SGD is a better version of the best algorithm that can be implemented on biological hardware. Similarly, humans are exposed to a much smaller diversity of data than LMs (the internet is big and weird), and thus they may get more “novelty” per flop and thus generalize better from less data. My main point here is just that “biology is optimal” isn’t as strong a rejoinder when we’re comparing a process so different from what biology did.