We don’t care about how many FLOPs something has. We care about how fast it can actually solve things.
As far as I know, in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans. When we had computers that could do arithmetic but nothing else, they were still much faster at arithmetic than humans. Whatever your view on the quality of recent AI-generated text or art, it’s clear that AI is producing it much much faster than human writers or artists can produce text/art.
Counterexamples would be interesting. No examples of superhuman but slower-than-human performance come to mind; for example, AFAIK, at all time controls except possibly ‘correspondence’, chess/Go AIs are superhuman. (The computer chess community, last I read any discussions, seemed to think that correspondence experts using chess AIs would still beat chess AIs; but this hasn’t been tested, and it’s unclear how much is simply due to a complete absence of research into correspondence chess strategy like allocation of search time, and letting the human correspondence players use chess AIs may be invalid in this context anyway.) Robots are usually much slower at tasks like humanoid robotics, but also aren’t superhuman most of those tasks—unless we count ultra-precise placement or manipulation tasks, maybe? LLMs are so fast that if they are superhuman at anything then they are faster-than-human too; most forms of inner-monologue/search to date either don’t take convincingly longer than a normal human pen-and-paper or calculator would or are still sub-human.3
I think counterexamples are easy to find. For example, chess engines in 1997 could play at the level of top human chess players on consumer hardware, but only if they were given orders of magnitude more time to think than the top humans had available. Around 1997 Deep Blue was of a similar strength to Kasparov, but it had to run on a supercomputer; on commercial hardware chess engines were still only 2400-2500 elo. If you ran them for long enough, though, they would obviously be stronger than even Deep Blue was.
I think the claim that “in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans” is a tautology because we only say we’ve successfully gotten AI to do a task when AI can beat the top humans at that task. Nobody said “we got AI to play Go” when AI Go engines were only amateur dan strength, even though they could have equally well said “we got AI to play Go at a superhuman level but it’s just very slow”.
A non-tautological version might say that the decrease over time in the compute multiplier the AIs need to compete with the top humans is steep, so it takes a short time for the AIs to transition from “much slower than humans” to “much faster than humans” when they are crossing the “human threshold”. I think there’s some truth to this version of the claim but it’s not really due to any advanced serial speed on the part of the AIs.
I think the claim that “in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans” is a tautology
It is not a tautology.
For example, chess engines in 1997 could play at the level of top human chess players on consumer hardware, but only if they were given orders of magnitude more time to think than the top humans had available. Around 1997 Deep Blue was of a similar strength to Kasparov, but it had to run on a supercomputer; on commercial hardware chess engines were still only 2400-2500 elo. If you ran them for long enough, though, they would obviously be stronger than even Deep Blue was.
Er, this is why I spent half my comment discussing correspondence chess… My understanding is that it is not true that if you ran computers for a long time that they would beat the human also running for a long time, and that historically, it’s been quite the opposite: the more time/compute spent, the better the human plays because they have a more scalable search. (eg. Shannon’s ‘type-A strategy vs type-B strategy’ was meant to cover this distinction: humans search moves slower, but we search moves way better, and that’s why we win.) It was only at short time controls where human ability to plan deeply was negated that chess engines had any chance. (In correspondence chess, they deeply analyze the most important lines of play and avoid getting distracted, so they can go much deeper into the game tree that chess engines could.) Whenever the crossover happened, it was probably after Deep Blue. And given the different styles of play and the asymptotics of how those AIs smash into the exponential wall of the game tree & explode, I’m not sure there is any time control at which you would expect pre-Deep-Blue chess to be superhuman.
This is similar to ML scaling. While curves are often parallel and don’t cross, curves often do cross; what they don’t do is criss-cross repeatedly. Once you fall behind asymptotically, you fall behind for good.
Nobody said “we got AI to play Go” when AI Go engines were only amateur dan strength, even though they could have equally well said “we got AI to play Go at a superhuman level but it’s just very slow”.
I’m not familiar with ‘correspondence Go’ but I would expect that if it was pursued seriously like correspondence chess, it would exhibit the same non-monotonicity.
If you ran them for long enough, though, they would obviously be stronger than even Deep Blue was.
That’s not obvious to me. If nothing else, that time isn’t useful once you run out of memory. And PCs had very little memory: you might have 16MB RAM to work with.
And obviously, I am not discussing the trivial case of changing the task to a different task by assuming unlimited hardware or ignoring time and just handicapping human performance by holding their time constant while assigning unlimited resources to computers—in which case we had superhuman chess, Go, and many other things whenever someone first wrote down a computable tree search strategy!*
* actually pretty nontrivial. Recall the arguments like Edgar Allan Poe’s that computer chess was impossible in principle. So there was a point at which even infinite compute didn’t help because no one knew how to write down and solve the game tree, as obvious and intuitive as that paradigm now seems to us. (I think that was Shannon, as surprisingly late as that may seem.) This also describes a lot of things we now can do in ML.
My understanding is that it is not true that if you ran computers for a long time that they would beat the human also running for a long time
(which I don’t disagree with, btw) you are misunderstanding what Ege was claiming, which was not that in 1997 chess engines on stock hardware would beat humans provided the time controls were long enough, but only that in 1997 chess engines on stock hardware would beat humans if you gave the chess engines a huge amount of time and somehow stopped the humans having anything like as much time.
In other words, he’s saying that in 1997 chess engines had “superhuman but slower-than-human performance”: that whatever a human could do, a chess engine could also do if given dramatically more time to do it than the human had.
And yes, this means that in some sense we had superhuman-but-slow chess as soon as someone wrote down a theoretically-valid tree search algorithm. Just as in some sense we have superhuman-but-slow intelligence[1] since someone wrote down the AIXI algorithm.
[1] In some sense of “intelligence” which may or may not be close enough to how the term is usually used.
I feel like there’s an interesting question here but can’t figure out a version of it that doesn’t end up being basically trivial.
Is there any case where we’ve figured out how to make machines do something at human level or better if we don’t care about speed, where they haven’t subsequently become able to do it at human level and much faster than humans?
Kinda-trivially yes, because anything we can write down an impracticably-slow algorithm for and haven’t yet figured out how to do better than that will count.
Is there any case where we’ve figured out how to make humans do something at human level or better if we don’t mind them being a few orders of magnitude slower than humans, where they haven’t subsequently become able to do it at human level and much faster than humans?
Kinda-trivially yes, because there are things we’ve only just very recently worked out how to make machines do well.
Is there any case where we’ve figured out how to make humans do something at human level or better if we don’t mind them being a few orders of magnitude slower than humans, and then despite a couple of decades of further work haven’t made them able to do it at human level and much faster than humans?
Kinda-trivially no, because until fairly recently Moore’s law was still delivering multiple-orders-of-magnitude speed improvements just by waiting, so anything we got to human level >=20 years ago has then got hugely faster that way.
Can you explain to me the empirical content of the claim, then? I don’t understand what it’s supposed to mean.
About the rest of your comment, I’m confused about why you’re discussing what happens when both chess engines and humans have a lot of time to do something. For example, what’s the point of this statement?
My understanding is that it is not true that if you ran computers for a long time that they would beat the human also running for a long time, and that historically, it’s been quite the opposite...
I don’t understand how this statement is relevant to any claim I made in my comment. Humans beating computers at equal time control is perfectly consistent with the computers being slower than humans. If you took a human and slowed them down by a factor of 10, that’s the same pattern you would see.
Are you instead trying to find examples of tasks where computers were beaten by humans when given a short time to do the task but could beat the humans when given a long time to do the task? That’s a very different claim from “in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans”.
Nobody said “we got AI to play Go” when AI Go engines were only amateur dan strength, even though they could have equally well said “we got AI to play Go at a superhuman level but it’s just very slow”.
I understand this as saying “If you take an AI Go engine from the pre-AlphaGo era, it was pretty bad in real time. But if you set the search depth to an extremely high value, it would be superhuman, it just might take a bajillion years per move. For that matter, in 1950, people had computers, and people knew how to do naive exhaustive tree search, so they could already make an algorithm that was superhuman at Go, it’s just that it would take like a googol years per move and require galactic-scale memory banks etc.”
Is that what you were trying to say? If not, can you rephrase?
Yes, that’s what I’m trying to say, though I think in actual practice the numbers you need would have been much smaller for the Go AIs I’m talking about than they would be for the naive tree search approach.
I’m interested in how much processing time Waymo requires. I.e. if I sped up the clock speed in a simulation such that things were happening much faster, how fast could we do that and still have it successfully handled the environment?
It’s arguably a superhuman driver in some domains already (fewer accidents), but not in others (handling OOD road conditions).
As far as I know, in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans. When we had computers that could do arithmetic but nothing else, they were still much faster at arithmetic than humans. Whatever your view on the quality of recent AI-generated text or art, it’s clear that AI is producing it much much faster than human writers or artists can produce text/art.
“Far far faster” is an exaggeration that conflates vastly different orders of magnitude with each other. When compared against humans, computers are many orders of magnitude faster at doing arithmetic than they are at generating text: a human can write perhaps one word per second when typing quickly, while an LLM’s serial speed of 50 tokens/sec maybe corresponds to 20 words/sec or so. That’s just a ~ 1.3 OOM difference, to be contrasted with 10 OOMs or more at the task of multiplying 32-bit integers, for instance. Are you not bothered at all by how wide the chasm between these two quantities seems to be, and whether it might be a problem for your model of this situation?
In addition, we know that this could be faster if we were willing to accept lower quality outputs, for example by having fewer layers in an LLM. There is a quality-serial speed tradeoff, and so ignoring quality and just looking at the speed at which text is generated is not a good thing to be doing. There’s a reason GPT-3.5 has smaller per token latency than GPT-4.
I think there is a weaker thesis which still seems plausible: For every task for which an ML system achieves human level performance, it is possible to perform the task with the ML system significantly faster than a human.
The restriction to ML models excludes hand-coded GOFAI algorithms (like Deep Blue), which in principle could solve all kinds of problems using brute force search.
We don’t care about how many FLOPs something has. We care about how fast it can actually solve things.
As far as I know, in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans. When we had computers that could do arithmetic but nothing else, they were still much faster at arithmetic than humans. Whatever your view on the quality of recent AI-generated text or art, it’s clear that AI is producing it much much faster than human writers or artists can produce text/art.
Counterexamples would be interesting. No examples of superhuman but slower-than-human performance come to mind; for example, AFAIK, at all time controls except possibly ‘correspondence’, chess/Go AIs are superhuman. (The computer chess community, last I read any discussions, seemed to think that correspondence experts using chess AIs would still beat chess AIs; but this hasn’t been tested, and it’s unclear how much is simply due to a complete absence of research into correspondence chess strategy like allocation of search time, and letting the human correspondence players use chess AIs may be invalid in this context anyway.) Robots are usually much slower at tasks like humanoid robotics, but also aren’t superhuman most of those tasks—unless we count ultra-precise placement or manipulation tasks, maybe? LLMs are so fast that if they are superhuman at anything then they are faster-than-human too; most forms of inner-monologue/search to date either don’t take convincingly longer than a normal human pen-and-paper or calculator would or are still sub-human.3
I think counterexamples are easy to find. For example, chess engines in 1997 could play at the level of top human chess players on consumer hardware, but only if they were given orders of magnitude more time to think than the top humans had available. Around 1997 Deep Blue was of a similar strength to Kasparov, but it had to run on a supercomputer; on commercial hardware chess engines were still only 2400-2500 elo. If you ran them for long enough, though, they would obviously be stronger than even Deep Blue was.
I think the claim that “in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans” is a tautology because we only say we’ve successfully gotten AI to do a task when AI can beat the top humans at that task. Nobody said “we got AI to play Go” when AI Go engines were only amateur dan strength, even though they could have equally well said “we got AI to play Go at a superhuman level but it’s just very slow”.
A non-tautological version might say that the decrease over time in the compute multiplier the AIs need to compete with the top humans is steep, so it takes a short time for the AIs to transition from “much slower than humans” to “much faster than humans” when they are crossing the “human threshold”. I think there’s some truth to this version of the claim but it’s not really due to any advanced serial speed on the part of the AIs.
It is not a tautology.
Er, this is why I spent half my comment discussing correspondence chess… My understanding is that it is not true that if you ran computers for a long time that they would beat the human also running for a long time, and that historically, it’s been quite the opposite: the more time/compute spent, the better the human plays because they have a more scalable search. (eg. Shannon’s ‘type-A strategy vs type-B strategy’ was meant to cover this distinction: humans search moves slower, but we search moves way better, and that’s why we win.) It was only at short time controls where human ability to plan deeply was negated that chess engines had any chance. (In correspondence chess, they deeply analyze the most important lines of play and avoid getting distracted, so they can go much deeper into the game tree that chess engines could.) Whenever the crossover happened, it was probably after Deep Blue. And given the different styles of play and the asymptotics of how those AIs smash into the exponential wall of the game tree & explode, I’m not sure there is any time control at which you would expect pre-Deep-Blue chess to be superhuman.
This is similar to ML scaling. While curves are often parallel and don’t cross, curves often do cross; what they don’t do is criss-cross repeatedly. Once you fall behind asymptotically, you fall behind for good.
I’m not familiar with ‘correspondence Go’ but I would expect that if it was pursued seriously like correspondence chess, it would exhibit the same non-monotonicity.
That’s not obvious to me. If nothing else, that time isn’t useful once you run out of memory. And PCs had very little memory: you might have 16MB RAM to work with.
And obviously, I am not discussing the trivial case of changing the task to a different task by assuming unlimited hardware or ignoring time and just handicapping human performance by holding their time constant while assigning unlimited resources to computers—in which case we had superhuman chess, Go, and many other things whenever someone first wrote down a computable tree search strategy!*
* actually pretty nontrivial. Recall the arguments like Edgar Allan Poe’s that computer chess was impossible in principle. So there was a point at which even infinite compute didn’t help because no one knew how to write down and solve the game tree, as obvious and intuitive as that paradigm now seems to us. (I think that was Shannon, as surprisingly late as that may seem.) This also describes a lot of things we now can do in ML.
I think that when you say
(which I don’t disagree with, btw) you are misunderstanding what Ege was claiming, which was not that in 1997 chess engines on stock hardware would beat humans provided the time controls were long enough, but only that in 1997 chess engines on stock hardware would beat humans if you gave the chess engines a huge amount of time and somehow stopped the humans having anything like as much time.
In other words, he’s saying that in 1997 chess engines had “superhuman but slower-than-human performance”: that whatever a human could do, a chess engine could also do if given dramatically more time to do it than the human had.
And yes, this means that in some sense we had superhuman-but-slow chess as soon as someone wrote down a theoretically-valid tree search algorithm. Just as in some sense we have superhuman-but-slow intelligence[1] since someone wrote down the AIXI algorithm.
[1] In some sense of “intelligence” which may or may not be close enough to how the term is usually used.
I feel like there’s an interesting question here but can’t figure out a version of it that doesn’t end up being basically trivial.
Is there any case where we’ve figured out how to make machines do something at human level or better if we don’t care about speed, where they haven’t subsequently become able to do it at human level and much faster than humans?
Kinda-trivially yes, because anything we can write down an impracticably-slow algorithm for and haven’t yet figured out how to do better than that will count.
Is there any case where we’ve figured out how to make humans do something at human level or better if we don’t mind them being a few orders of magnitude slower than humans, where they haven’t subsequently become able to do it at human level and much faster than humans?
Kinda-trivially yes, because there are things we’ve only just very recently worked out how to make machines do well.
Is there any case where we’ve figured out how to make humans do something at human level or better if we don’t mind them being a few orders of magnitude slower than humans, and then despite a couple of decades of further work haven’t made them able to do it at human level and much faster than humans?
Kinda-trivially no, because until fairly recently Moore’s law was still delivering multiple-orders-of-magnitude speed improvements just by waiting, so anything we got to human level >=20 years ago has then got hugely faster that way.
Can you explain to me the empirical content of the claim, then? I don’t understand what it’s supposed to mean.
About the rest of your comment, I’m confused about why you’re discussing what happens when both chess engines and humans have a lot of time to do something. For example, what’s the point of this statement?
I don’t understand how this statement is relevant to any claim I made in my comment. Humans beating computers at equal time control is perfectly consistent with the computers being slower than humans. If you took a human and slowed them down by a factor of 10, that’s the same pattern you would see.
Are you instead trying to find examples of tasks where computers were beaten by humans when given a short time to do the task but could beat the humans when given a long time to do the task? That’s a very different claim from “in every case where we’ve successfully gotten AI to do a task at all, AI has done that task far far faster than humans”.
I understand this as saying “If you take an AI Go engine from the pre-AlphaGo era, it was pretty bad in real time. But if you set the search depth to an extremely high value, it would be superhuman, it just might take a bajillion years per move. For that matter, in 1950, people had computers, and people knew how to do naive exhaustive tree search, so they could already make an algorithm that was superhuman at Go, it’s just that it would take like a googol years per move and require galactic-scale memory banks etc.”
Is that what you were trying to say? If not, can you rephrase?
And if humans spent a googol years planning their moves, those moves would still be better because their search scales better.
Yes, that’s what I’m trying to say, though I think in actual practice the numbers you need would have been much smaller for the Go AIs I’m talking about than they would be for the naive tree search approach.
I’m interested in how much processing time Waymo requires. I.e. if I sped up the clock speed in a simulation such that things were happening much faster, how fast could we do that and still have it successfully handled the environment?
It’s arguably a superhuman driver in some domains already (fewer accidents), but not in others (handling OOD road conditions).
“Far far faster” is an exaggeration that conflates vastly different orders of magnitude with each other. When compared against humans, computers are many orders of magnitude faster at doing arithmetic than they are at generating text: a human can write perhaps one word per second when typing quickly, while an LLM’s serial speed of 50 tokens/sec maybe corresponds to 20 words/sec or so. That’s just a ~ 1.3 OOM difference, to be contrasted with 10 OOMs or more at the task of multiplying 32-bit integers, for instance. Are you not bothered at all by how wide the chasm between these two quantities seems to be, and whether it might be a problem for your model of this situation?
In addition, we know that this could be faster if we were willing to accept lower quality outputs, for example by having fewer layers in an LLM. There is a quality-serial speed tradeoff, and so ignoring quality and just looking at the speed at which text is generated is not a good thing to be doing. There’s a reason GPT-3.5 has smaller per token latency than GPT-4.
I think there is a weaker thesis which still seems plausible: For every task for which an ML system achieves human level performance, it is possible to perform the task with the ML system significantly faster than a human.
The restriction to ML models excludes hand-coded GOFAI algorithms (like Deep Blue), which in principle could solve all kinds of problems using brute force search.