(I did not carefully think about my predictions. I just wanted to state them somewhere because I think it’s generally good to state stuff publicly.)
(My future self will not necessarily make similar predictions as I am now.)
TLDR: I don’t know.
Timelines
Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:
How long until the sun (starts to) get eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
How long until an AI reaches Elo 4000 on codeforces? 10/50/90: 9mo, 2.5y, 11.5y
How long until an AI is better at math research than the best human mathmatician according to the world’s best mathematicians? 10/50/90: 2y, 7.5y, 28y
Takeoff Speed
I’m confident (94%) that it is easier to code an AI on a normal 2020 laptop that can do Einstein-level research at 1000x speed, than it is to solve the alignment problem very robustly[1].[2]
AIs might decide not to implement the very efficient AGIs in order to scale safer and first solve their alignment problem, but once a mind has solved the alignment problem very robustly, I expect everything to go extremely quickly.
However, the relevant question is how fast AI will get smarter shortly before the point where ze[3] becomes able to solve the alignment problem (or alternatively until ze decides making itself smarter quickly is too risky and it should cooperate with humanity and/or other similarly smart AIs currently being created to solve alignment).
So the question is: Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
I’m very tentatively leaning more towards the “fast” side, but i don’t know.
I’d expect (80%) to see at least one more paradigm shift that is at least as big as the one from LSTMs to transformers. It’s plausible to me that the results from the shift will come faster because we have greater computer overhang. (Though also possible it will just take even more compute.)
It’s possible (33%) that the world ends within 1 year of a new major discovery[4]. It might just very quickly improve inside a lab over the course of weeks without the operators there really realizing it[5], until it then sectretly exfiltrates itself, etc.
(Btw, smart people who can see the dangerous implications of some papers proposing something should obviously not publicly point to stuff that looks dangerous (else other people will try it).)
Hard to define what I mean by “very robustly”, but sth like “having coded an AI program s.t. a calibrated mind would expect <1% of expected value loss if run, compared to the ideal CEV aligned superintelligence”.
I acknowledge this is a nontrivial claim. I probably won’t be willing to invest the time to try to explain why if someone asks me now. The inferential distance is quite large. But you may ask.
E.g. because the AI is in a training phase and only interacts with operators sometimes where it doesn’t tell them everything. And in AI training the AI practices solving lots and lots of research problems and learns much more sample-efficient than transformers.
How long until the earth gets eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
Catastrophes induced by narrow capabilities (notably biotech) can push it further, so this might imply that they probably don’t occur[1]. Also, aligned AI might decide not to, it’s not as nutritious as the Sun anyway.
Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
AI speed advantage makes fast vs. slow ambiguous, because it doesn’t require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with the old techniques).
Please make no assumptions about those just because other people with some models might make similar predictions or so.
(That’s not a reasonable ask, it intervenes on reasoning in a way that’s not an argument for why it would be mistaken. It’s always possible a hypothesis doesn’t match reality, that’s not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.)
(That’s not a reasonable ask, it intervenes on reasoning in a way that’s not an argument for why it would be mistaken. It’s always possible a hypothesis doesn’t match reality, that’s not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.)
Yeah you can hypothesize. If you state it publicly though, please make sure to flag it as hypothesis.
If you state it publicly though, please make sure to flag it as hypothesis.
Also not a reasonable ask, friction targeted at a particular thing makes it slightly less convenient, and therefore it stops happening in practice completely. ~Everything is a hypothesis, ~all models are wrong, in each case language makes what distinctions it tends to in general.
How long until the earth gets eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
Catastrophes induced by narrow capabilities (notably biotech) can push it further, so this might imply that they probably don’t occur.
No it doesn’t imply this, I set this disclaimer “Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:”. Though yeah I don’t particularly expect those to occur.
The “AI might decide not to” point stands I think. This for me represents change of mind, I wouldn’t have previously endorsed this point, but since recently I think arbitrary superficial asks like this can become reflectively stable with nontrivial probability, resisting strong cost-benefit arguments even after intelligence explosion.
ok edited to sun. (i used earth first because i don’t know how long it will take to eat the sun, whereas earth seems likely to be feasible to eat quickly.)
(plausible to me that an aligned AI will still eat the earth but scan all the relevant information out of it and later maybe reconstruct it.)
Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
AI speed advantage makes fast vs. slow ambiguous, because it doesn’t require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with some old technique).
Ok yeah I think my statement is conflating fast-vs-slow with breakthrough-vs-continuous, though I think there’s a correlation.
(I still think fast-vs-slow makes sense as concept separately and is important.)
It seems a little surprising to me how rarely confident pessimists (p(doom)>0.9) they argue with moderate optimists (p(doom)≤0.5). I’m not specifically talking about this post. But it would be interesting if people revealed their disagreement more often.
My p(this branch of humanity won’t fulfill the promise of the night sky) is actually more like 0.82 or sth, idk. (I’m even lower on p(everyone will die), because there might be superintelligences in other branches that acausally trade to save the existing lives, though I didn’t think about it carefully.)
I’m chatting 1 hour every 2 weeks with Erik Jenner. We usually talk about AI safety stuff. Otherwise also like 1h every 2 weeks with a person who has sorta similar views to me. Otherwise I currently don’t talk much to people about AI risk.
My AI predictions
(I did not carefully think about my predictions. I just wanted to state them somewhere because I think it’s generally good to state stuff publicly.)
(My future self will not necessarily make similar predictions as I am now.)
TLDR: I don’t know.
Timelines
Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:
How long until the sun (starts to) get eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
How long until an AI reaches Elo 4000 on codeforces? 10/50/90: 9mo, 2.5y, 11.5y
How long until an AI is better at math research than the best human mathmatician according to the world’s best mathematicians? 10/50/90: 2y, 7.5y, 28y
Takeoff Speed
I’m confident (94%) that it is easier to code an AI on a normal 2020 laptop that can do Einstein-level research at 1000x speed, than it is to solve the alignment problem very robustly[1].[2]
AIs might decide not to implement the very efficient AGIs in order to scale safer and first solve their alignment problem, but once a mind has solved the alignment problem very robustly, I expect everything to go extremely quickly.
However, the relevant question is how fast AI will get smarter shortly before the point where ze[3] becomes able to solve the alignment problem (or alternatively until ze decides making itself smarter quickly is too risky and it should cooperate with humanity and/or other similarly smart AIs currently being created to solve alignment).
So the question is: Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
I’m very tentatively leaning more towards the “fast” side, but i don’t know.
I’d expect (80%) to see at least one more paradigm shift that is at least as big as the one from LSTMs to transformers. It’s plausible to me that the results from the shift will come faster because we have greater computer overhang. (Though also possible it will just take even more compute.)
It’s possible (33%) that the world ends within 1 year of a new major discovery[4]. It might just very quickly improve inside a lab over the course of weeks without the operators there really realizing it[5], until it then sectretly exfiltrates itself, etc.
(Btw, smart people who can see the dangerous implications of some papers proposing something should obviously not publicly point to stuff that looks dangerous (else other people will try it).)
Hard to define what I mean by “very robustly”, but sth like “having coded an AI program s.t. a calibrated mind would expect <1% of expected value loss if run, compared to the ideal CEV aligned superintelligence”.
I acknowledge this is a nontrivial claim. I probably won’t be willing to invest the time to try to explain why if someone asks me now. The inferential distance is quite large. But you may ask.
ze is the AI pronoun.
Tbc, not 33% after the first major discovery after transformers, just after any.
E.g. because the AI is in a training phase and only interacts with operators sometimes where it doesn’t tell them everything. And in AI training the AI practices solving lots and lots of research problems and learns much more sample-efficient than transformers.
Catastrophes induced by narrow capabilities (notably biotech) can push it further, so
this might imply that they probably don’t occur[1]. Also, aligned AI might decide not to, it’s not as nutritious as the Sun anyway.AI speed advantage makes fast vs. slow ambiguous, because it doesn’t require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with the old techniques).
(That’s not a reasonable ask, it intervenes on reasoning in a way that’s not an argument for why it would be mistaken. It’s always possible a hypothesis doesn’t match reality, that’s not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.)
There was a “no huge global catastrophe” condition on the prediction that I missed, thanks Towards_Keeperhood for correction.
Yeah you can hypothesize. If you state it publicly though, please make sure to flag it as hypothesis.
Also not a reasonable ask, friction targeted at a particular thing makes it slightly less convenient, and therefore it stops happening in practice completely. ~Everything is a hypothesis, ~all models are wrong, in each case language makes what distinctions it tends to in general.
ok thx, edited. thanks for feedback!
No it doesn’t imply this, I set this disclaimer “Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:”. Though yeah I don’t particularly expect those to occur.
The “AI might decide not to” point stands I think. This for me represents change of mind, I wouldn’t have previously endorsed this point, but since recently I think arbitrary superficial asks like this can become reflectively stable with nontrivial probability, resisting strong cost-benefit arguments even after intelligence explosion.
Right, I missed this.
ok edited to sun. (i used earth first because i don’t know how long it will take to eat the sun, whereas earth seems likely to be feasible to eat quickly.)
(plausible to me that an aligned AI will still eat the earth but scan all the relevant information out of it and later maybe reconstruct it.)
Ok yeah I think my statement is conflating fast-vs-slow with breakthrough-vs-continuous, though I think there’s a correlation.
(I still think fast-vs-slow makes sense as concept separately and is important.)
It seems a little surprising to me how rarely confident pessimists (p(doom)>0.9) they argue with moderate optimists (p(doom)≤0.5).
I’m not specifically talking about this post. But it would be interesting if people revealed their disagreement more often.
Seems totally unrelated to my post but whatever:
My p(this branch of humanity won’t fulfill the promise of the night sky) is actually more like 0.82 or sth, idk. (I’m even lower on p(everyone will die), because there might be superintelligences in other branches that acausally trade to save the existing lives, though I didn’t think about it carefully.)
I’m chatting 1 hour every 2 weeks with Erik Jenner. We usually talk about AI safety stuff. Otherwise also like 1h every 2 weeks with a person who has sorta similar views to me. Otherwise I currently don’t talk much to people about AI risk.