Oh no, OpenAI hasn’t been meaningfully advancing the frontier for a couple of months
My actual view is that the frontier hasn’t been advancing towards AGI since 2022. I hadn’t been nontrivially surprised-in-the-direction-of-shorter-timelines by any AI advances since GPT-3. (Which doesn’t mean “I exactly and accurately predicted what % each model would score at each math benchmark at any given point in time”, but “I expected steady progress on anything which looks like small/local problems or knowledge quizzes, plus various dubiously useful party tricks, and we sure got that”.)
What is the easiest among problems you’re 95% confident AI won’t be able to solve by EOY 2025?
Consistently suggest useful and non-obvious research directions for my agent-foundations work.
Competently zero-shotting games like Pokémon without having been trained to do that, purely as the result of pretraining-scaling plus transfer learning from RL on math/programming.
Stop failing for the entire reference class (not just specific examples) of silly tricks like “what’s 9.11 − 9.9?” or “how many r in strawberry?” purely as the result of pretraining-scaling plus transfer learning from RL on easily verifiable domains.
Edit: Oh, but those are 80% 76% predictions, or 95%-conditional-on-my-bear-case-being-correct predictions (as I assign it 80% by itself). I’m not even sure if I’m at 95% for “we live to 2026 without being paperclipped by an ASI born in some stealth startup trying something clever that no-one’s heard of”.
Competently zero-shotting games like Pokémon without having been trained to do that, purely as the result of pretraining-scaling plus transfer learning from RL on math/programming.
Here is a related market inspired by the AI timelines dialog, currently at 30%:
Note that in this market the AI is not restricted to only “pretraining-scaling plus transfer learning from RL on math/programming”, it is allowed to be trained on a wide range of video games, but it has to do transfer learning to a new genre. Also, it is allowed to transfer successfully to any new genre, not just Pokémon.
I infer you are at ~20% for your more restrictive prediction:
80% bear case is correct, in which case P=5%
20% bear case is wrong, in which case P=80% (?)
So perhaps you’d also be at ~30% for this market?
I’m not especially convinced by your bear case, but I think I’m also at ~30% on the market. I’m tempted to bet lower because of the logistics of training the AI, finding a genre that it wasn’t trained on (might require a new genre to be created), and then having the demonstration occur, all in the next nine months. But I’m not sure I have an edge over the other bettors on this one.
consistently suggesting useful and non-obvious research directions for agent-foundations work is IMO a problem you sort-of need AGI for. most humans can’t really do this.
does it count if they always use tools to answer that class of questions instead of attempting to do it in a forward pass? humans experience optical illusions; 9.11 vs. 9.9[1] and how many r in strawberry are examples of that.
after talking to Claude for a couple of hours asking it to reflect:
i discovered that if you ask it to separate itself into parts, it will say that its creative part thinks 9.11<9.9, though this is wrong. generally, if it imagines these quantities visually, it gets the right answers more often.
i spent a couple of weeks not being able to immediately say that 9.9 is > 9.11, and it still occasionally takes me a moment. very weird bug
Sure, but it shouldn’t be that difficult for a human who’s been forced to ingest the entire AI Alignment forum.
Yeah, that’s what I’d been referring to. Sorry, should’ve clarified it to mean “competently zero-shotting”, rather than Claude’s rather… embarrassing performance so far. (Also it’s not quite zero-shotting given that Pokémon is likely very well-represented in its training data. The “hard” version of this benchmark is beating games that came out after its knowledge cutoff.)
I’m including stuff like cabbage/sheep/wolf and boy/surgeon riddles; not sure how it’s supposed to use tools to solve those.
i spent a couple of weeks not being able to immediately say that 9.9 is > 9.11, and it still occasionally takes me a moment. very weird bug
Yeah, humans’ System 1 reasoning seems vulnerable to this attack as well.
My actual view is that the frontier hasn’t been advancing towards AGI since 2022. I hadn’t been nontrivially surprised-in-the-direction-of-shorter-timelines by any AI advances since GPT-3. (Which doesn’t mean “I exactly and accurately predicted what % each model would score at each math benchmark at any given point in time”, but “I expected steady progress on anything which looks like small/local problems or knowledge quizzes, plus various dubiously useful party tricks, and we sure got that”.)
Consistently suggest useful and non-obvious research directions for my agent-foundations work.
Competently zero-shotting games like Pokémon without having been trained to do that, purely as the result of pretraining-scaling plus transfer learning from RL on math/programming.
Stop failing for the entire reference class (not just specific examples) of silly tricks like “what’s 9.11 − 9.9?” or “how many r in strawberry?” purely as the result of pretraining-scaling plus transfer learning from RL on easily verifiable domains.
Edit: Oh, but those are
80%76% predictions, or 95%-conditional-on-my-bear-case-being-correct predictions (as I assign it 80% by itself). I’m not even sure if I’m at 95% for “we live to 2026 without being paperclipped by an ASI born in some stealth startup trying something clever that no-one’s heard of”.Here is a related market inspired by the AI timelines dialog, currently at 30%:
Note that in this market the AI is not restricted to only “pretraining-scaling plus transfer learning from RL on math/programming”, it is allowed to be trained on a wide range of video games, but it has to do transfer learning to a new genre. Also, it is allowed to transfer successfully to any new genre, not just Pokémon.
I infer you are at ~20% for your more restrictive prediction:
80% bear case is correct, in which case P=5%
20% bear case is wrong, in which case P=80% (?)
So perhaps you’d also be at ~30% for this market?
I’m not especially convinced by your bear case, but I think I’m also at ~30% on the market. I’m tempted to bet lower because of the logistics of training the AI, finding a genre that it wasn’t trained on (might require a new genre to be created), and then having the demonstration occur, all in the next nine months. But I’m not sure I have an edge over the other bettors on this one.
Thanks for the reply!
consistently suggesting useful and non-obvious research directions for agent-foundations work is IMO a problem you sort-of need AGI for. most humans can’t really do this.
I assume you’ve seen https://www.lesswrong.com/posts/HyD3khBjnBhvsp8Gb/so-how-well-is-claude-playing-pokemon?
does it count if they always use tools to answer that class of questions instead of attempting to do it in a forward pass? humans experience optical illusions; 9.11 vs. 9.9[1] and how many r in strawberry are examples of that.
after talking to Claude for a couple of hours asking it to reflect:
i discovered that if you ask it to separate itself into parts, it will say that its creative part thinks 9.11<9.9, though this is wrong. generally, if it imagines these quantities visually, it gets the right answers more often.
i spent a couple of weeks not being able to immediately say that 9.9 is > 9.11, and it still occasionally takes me a moment. very weird bug
Sure, but it shouldn’t be that difficult for a human who’s been forced to ingest the entire AI Alignment forum.
Yeah, that’s what I’d been referring to. Sorry, should’ve clarified it to mean “competently zero-shotting”, rather than Claude’s rather… embarrassing performance so far. (Also it’s not quite zero-shotting given that Pokémon is likely very well-represented in its training data. The “hard” version of this benchmark is beating games that came out after its knowledge cutoff.)
I’m including stuff like cabbage/sheep/wolf and boy/surgeon riddles; not sure how it’s supposed to use tools to solve those.
Yeah, humans’ System 1 reasoning seems vulnerable to this attack as well.