This is a very good question! I can’t speak for Eliezer, so the following are just my thoughts...
Before GPT, it seemed impossible to make a machine that is comparable to a human. In each aspect, it was either dramatically better, or dramatically worse. A calculator can multiply billion times faster than I can; but it cannot write poetry at all.
So, when thinking about gradual progress, starting at “worse than human” and ending at “better than human”, it seemed like… either the premise of gradual progress is wrong, and somewhere along the path there will be one crucial insight that will move the machine from dramatically worse than human to dramatically better than human… or if it indeed is gradual in some sense, the transition will still be super fast.
The calculator is the example of “if it is better than me, then it is way better than me”.
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
The current GPT machines are something that I have never seen before: better than humans in some aspects, worse than humans in other aspects, both in the area of processing text. I definitely would not have predicted that. Without the benefit of hindsight, it feels just as weird as a calculator that would do addition faster than humans, but multiplication slower than humans and with occasional mistakes. This simply is not how I have expected programs to behave. If someone told me that they are planning to build a GPT, I would expect it to either not work at all (more likely), or to be superintelligent (less likely). The option “it works, kinda correctly, but it’s kinda lame” was not on my radar.
I am not sure what this all means. My current best guess is that this is what “learning from humans” can cause: you can almost reach them, but cannot use this alone to surpass them.
The calculator is not observing millions of humans doing multiplication, and then trying to do something statistically similar. Instead, it has an algorithm designed from scratch that solves the mathematical tasks.
The chess and go machines, they needed a lot of data to learn, but they could generate the data themselves, playing millions of games against each other. So they needed the data, but they didn’t need the humans as a source of the data; they could generate the data faster.
The weakness of GPT is that if you already feed it the entire internet and all the books ever written, you cannot get more training data. Actually, you could, with ubiquitous eavesdropping… and someone is probably already working on this. But you still need humans as a source. You cannot teach GPT by texts generated by GPT, because unlike the chess and go, you do not have the exact rules to tell you which generated ouputs are the new winning moves, and which are nonsense.
There are of course other aspects where GPT can easily surpass humans: the sheer quantity of the text it can learn from and process. If it can write mediocre computer programs, then it can write mediocre computer programs in thousand different programming languages. If it can make puns or write poems at all, it will evaluate possible puns or rhymes million times faster, in any language. If it can match patterns, it can match patterns in the entire output of humanity; a new polymath.
The social consequences may be dramatic. Even if GPT is not able to replace a human expert, it can probably replace human beginners in many professions… but if the beginners become unemployable, where will the new experts come from? By being able to deal with more complexity, GPT can make the society more complex, perhaps in a way that we will need GPT to navigate it. Would you trust a human lawyer to interpret a 10000-page legal contract designed by GPT correctly?
And yet, I wouldn’t call GPT superhuman in the sense of “smarter than Einstein”, because it also keeps making dumb mistakes. It doesn’t seem to me that more input text or more CPU alone would fix this. (But maybe I am wrong.) If feels like some insight is needed instead. Though, that insight may turn out to be relatively trivial, like maybe just a prompt asking the GPT to reflect on its own words, or something like that. If this turns out to be true, then the distance between the village idiot and Einstein actually wasn’t that big.
Or maybe we get stuck where we are, and the only progress will come from having more CPU, in which case it may take a decade or two to reach Einstein levels. Or maybe it turns out that GPT can never be smarter in certain sense than its input texts, though this seems unlikely to me.
tl;dr—we may be one clever prompt away from Einstein, or we may need 1000× more compute, no idea
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.
The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some automatic parameter tuning, and alpha-beta search with some manually designed depth extension heuristics. Over the years, people designed better and better evaluation functions and invented various tricks to reduce the amount of work spent on unpromising branches of the game tree.
Long into the 1990s, many strong players were convinced that this approach would not scale to world championship levels, because they believed that play competitive at the world champion level required correctly dealing with various difficult strategic problems, and that working within the prevailing paradigm would only lead to engines that were even more superhuman at tactics than had been already observed, while still failing against the strongest players due to lack of strategic foresight. This proved to be wrong: classical chess programs reached massively superhuman strength on the traditional approach to chess programming, and this line of programs was completely dominant and still improving up to about the year 2019.
In 2019, a team at DeepMind showed that throwing reinforcement learning and Monte Carlo Tree Search at chess (and various other games) could produce a system playing at an even higher level than the then-current version of Stockfish running on very strong hardware. Today, the best engines use either this approach or the traditional approach to chess programming augmented by incorporation of a very lightweight neural network for accurate positional evaluation.
For Go, there was hardly any significant progress from about the early 90s to the early 2010s: programs were roughly at the level of a casual player who had studied the game for a few months. A conceptual breakthrough (the invention of Monte-Carlo Tree Search) then brought them to a level equivalent in chess maybe to a master by the mid-2010s. DeepMind’s AlphaGo system then showed in 2016 that reinforcement learning and MCTS could produce a system performing at a superhuman level when run on a very powerful computer. Today, programs based on the same principles (with some relatively minor go-specific improvements) run at substantially higher playing strength than AlphaGo on consumer hardware. The vast majority of strong players was completely convinced in 2016 that AlphaGo would not win its match against Lee Sedol (a world-class human player).
Chess programs had been superhuman at the things they were good at (spotting short tactics) for a long time before surpassing humans in general playing strength, arguably because their weaknesses improved less quickly than their strengths. Their weaknesses are in fact still in evidence today: it is not difficult to construct positions that the latest versions of LC0 or Stockfish don’t handle correctly, but it is very difficult indeed to exploit this in real games. For Go programs, similar remaining weak spots have recently been shown to be exploitable in real games (see https://goattack.far.ai/), although my understanding is that these weaknesses have now largely been patched.
I think the general lesson that AI performance at a task will be determined by the aspects of that task that the AI handles best when the AI is far below human levels and by the aspects of the task that the AI handles worst when it is at or above human level, and that this slows down perceived improvement relative to humans once the AI is massively better than humans at some task-relevant capabilities, does in my expectation carry over to some extent from narrow AI (like chess computers) to general AI (like language models). In terms of the transition from chimpanzee-level intelligence to Einstein, this means that the argument from the relatively short time span evolution took to bridge that gap is probably not as general as it might look at first sight, as chimpanzees and humans probably share similar architecture-induced cognitive gaps, whereas the bottlenecks of an AI could be very different.
This would suggest (maybe counterintuitively) that fast takeoff scenarios are more likely with cognitive architectures that are similar to humans than with very alien ones.
You cannot teach GPT by texts generated by GPT, because unlike the chess and go, you do not have the exact rules to tell you which generated ouputs are the new winning moves, and which are nonsense.
You can ask GPT which are nonsense (in various ways), with no access to ground truth, and that actually works to improve responses. This sort of approach was even used to fine-tune GPT-4 (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report).
I checked out that section but what you are saying doesn’t follow for me. The section describes fine tuning compute and optimizing scalability, how does this relate to self improvement. There is a possibility I am looking in the wrong section, I was reading was about algorithms that efficiently were predicting how ChatGPT would scale. Also I didn’t see anything about a 4-step algorithm. Anyways could you explain what you mean or where I can find the right section?
You might be looking at the section 3.1 of the main report on page 2 (of the revision 3 pdf). I’m talking about page 64, which is part of section 3.1 of System Card and not of the main report, but still within the same pdf document. (Does the page-anchored link I used not work on your system to display the correct page?)
Yes thanks, the page anchorage doesn’t work for me probably the device I am using. I just get page 1.
That is super interesting it is able to find inconsistencies and fix them, I didn’t know that they defined them as hallucinations. What would expanding the capabilities of this sort of self improvement look like? It seems necessary to have a general understanding of what rational conversation looks like. It is an interesting situation where it knows what is bad and is able to fix it but wasn’t doing that anyways.
This is probably only going to become important once model-generated data is used for pre-training (or fine-tuning that’s functionally the same thing as continuing a pre-training run), and this process is iterated for many epochs, like with the MCTS things that play chess and Go. And you can probably just alpaca any pre-trained model you can get your hands on to start the ball rolling.
The amplifications in the papers are more ambitious this year than the last, but probably still not quite on that level. One way this could change quickly is if the plugins become a programming language, but regardless I dread visible progress by the end of the year. And once the amplification-distillation cycle gets closed, autonomous training of advanced skillsbecomes possible.
This is a very good question! I can’t speak for Eliezer, so the following are just my thoughts...
Before GPT, it seemed impossible to make a machine that is comparable to a human. In each aspect, it was either dramatically better, or dramatically worse. A calculator can multiply billion times faster than I can; but it cannot write poetry at all.
So, when thinking about gradual progress, starting at “worse than human” and ending at “better than human”, it seemed like… either the premise of gradual progress is wrong, and somewhere along the path there will be one crucial insight that will move the machine from dramatically worse than human to dramatically better than human… or if it indeed is gradual in some sense, the transition will still be super fast.
The calculator is the example of “if it is better than me, then it is way better than me”.
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
The current GPT machines are something that I have never seen before: better than humans in some aspects, worse than humans in other aspects, both in the area of processing text. I definitely would not have predicted that. Without the benefit of hindsight, it feels just as weird as a calculator that would do addition faster than humans, but multiplication slower than humans and with occasional mistakes. This simply is not how I have expected programs to behave. If someone told me that they are planning to build a GPT, I would expect it to either not work at all (more likely), or to be superintelligent (less likely). The option “it works, kinda correctly, but it’s kinda lame” was not on my radar.
I am not sure what this all means. My current best guess is that this is what “learning from humans” can cause: you can almost reach them, but cannot use this alone to surpass them.
The calculator is not observing millions of humans doing multiplication, and then trying to do something statistically similar. Instead, it has an algorithm designed from scratch that solves the mathematical tasks.
The chess and go machines, they needed a lot of data to learn, but they could generate the data themselves, playing millions of games against each other. So they needed the data, but they didn’t need the humans as a source of the data; they could generate the data faster.
The weakness of GPT is that if you already feed it the entire internet and all the books ever written, you cannot get more training data. Actually, you could, with ubiquitous eavesdropping… and someone is probably already working on this. But you still need humans as a source. You cannot teach GPT by texts generated by GPT, because unlike the chess and go, you do not have the exact rules to tell you which generated ouputs are the new winning moves, and which are nonsense.
There are of course other aspects where GPT can easily surpass humans: the sheer quantity of the text it can learn from and process. If it can write mediocre computer programs, then it can write mediocre computer programs in thousand different programming languages. If it can make puns or write poems at all, it will evaluate possible puns or rhymes million times faster, in any language. If it can match patterns, it can match patterns in the entire output of humanity; a new polymath.
The social consequences may be dramatic. Even if GPT is not able to replace a human expert, it can probably replace human beginners in many professions… but if the beginners become unemployable, where will the new experts come from? By being able to deal with more complexity, GPT can make the society more complex, perhaps in a way that we will need GPT to navigate it. Would you trust a human lawyer to interpret a 10000-page legal contract designed by GPT correctly?
And yet, I wouldn’t call GPT superhuman in the sense of “smarter than Einstein”, because it also keeps making dumb mistakes. It doesn’t seem to me that more input text or more CPU alone would fix this. (But maybe I am wrong.) If feels like some insight is needed instead. Though, that insight may turn out to be relatively trivial, like maybe just a prompt asking the GPT to reflect on its own words, or something like that. If this turns out to be true, then the distance between the village idiot and Einstein actually wasn’t that big.
Or maybe we get stuck where we are, and the only progress will come from having more CPU, in which case it may take a decade or two to reach Einstein levels. Or maybe it turns out that GPT can never be smarter in certain sense than its input texts, though this seems unlikely to me.
tl;dr—we may be one clever prompt away from Einstein, or we may need 1000× more compute, no idea
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.
The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some automatic parameter tuning, and alpha-beta search with some manually designed depth extension heuristics. Over the years, people designed better and better evaluation functions and invented various tricks to reduce the amount of work spent on unpromising branches of the game tree.
Long into the 1990s, many strong players were convinced that this approach would not scale to world championship levels, because they believed that play competitive at the world champion level required correctly dealing with various difficult strategic problems, and that working within the prevailing paradigm would only lead to engines that were even more superhuman at tactics than had been already observed, while still failing against the strongest players due to lack of strategic foresight. This proved to be wrong: classical chess programs reached massively superhuman strength on the traditional approach to chess programming, and this line of programs was completely dominant and still improving up to about the year 2019.
In 2019, a team at DeepMind showed that throwing reinforcement learning and Monte Carlo Tree Search at chess (and various other games) could produce a system playing at an even higher level than the then-current version of Stockfish running on very strong hardware. Today, the best engines use either this approach or the traditional approach to chess programming augmented by incorporation of a very lightweight neural network for accurate positional evaluation.
For Go, there was hardly any significant progress from about the early 90s to the early 2010s: programs were roughly at the level of a casual player who had studied the game for a few months. A conceptual breakthrough (the invention of Monte-Carlo Tree Search) then brought them to a level equivalent in chess maybe to a master by the mid-2010s. DeepMind’s AlphaGo system then showed in 2016 that reinforcement learning and MCTS could produce a system performing at a superhuman level when run on a very powerful computer. Today, programs based on the same principles (with some relatively minor go-specific improvements) run at substantially higher playing strength than AlphaGo on consumer hardware. The vast majority of strong players was completely convinced in 2016 that AlphaGo would not win its match against Lee Sedol (a world-class human player).
Chess programs had been superhuman at the things they were good at (spotting short tactics) for a long time before surpassing humans in general playing strength, arguably because their weaknesses improved less quickly than their strengths. Their weaknesses are in fact still in evidence today: it is not difficult to construct positions that the latest versions of LC0 or Stockfish don’t handle correctly, but it is very difficult indeed to exploit this in real games. For Go programs, similar remaining weak spots have recently been shown to be exploitable in real games (see https://goattack.far.ai/), although my understanding is that these weaknesses have now largely been patched.
I think the general lesson that AI performance at a task will be determined by the aspects of that task that the AI handles best when the AI is far below human levels and by the aspects of the task that the AI handles worst when it is at or above human level, and that this slows down perceived improvement relative to humans once the AI is massively better than humans at some task-relevant capabilities, does in my expectation carry over to some extent from narrow AI (like chess computers) to general AI (like language models). In terms of the transition from chimpanzee-level intelligence to Einstein, this means that the argument from the relatively short time span evolution took to bridge that gap is probably not as general as it might look at first sight, as chimpanzees and humans probably share similar architecture-induced cognitive gaps, whereas the bottlenecks of an AI could be very different.
This would suggest (maybe counterintuitively) that fast takeoff scenarios are more likely with cognitive architectures that are similar to humans than with very alien ones.
You can ask GPT which are nonsense (in various ways), with no access to ground truth, and that actually works to improve responses. This sort of approach was even used to fine-tune GPT-4 (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report).
I checked out that section but what you are saying doesn’t follow for me. The section describes fine tuning compute and optimizing scalability, how does this relate to self improvement.
There is a possibility I am looking in the wrong section, I was reading was about algorithms that efficiently were predicting how ChatGPT would scale. Also I didn’t see anything about a 4-step algorithm.
Anyways could you explain what you mean or where I can find the right section?
You might be looking at the section 3.1 of the main report on page 2 (of the revision 3 pdf). I’m talking about page 64, which is part of section 3.1 of System Card and not of the main report, but still within the same pdf document. (Does the page-anchored link I used not work on your system to display the correct page?)
Yes thanks, the page anchorage doesn’t work for me probably the device I am using. I just get page 1.
That is super interesting it is able to find inconsistencies and fix them, I didn’t know that they defined them as hallucinations. What would expanding the capabilities of this sort of self improvement look like? It seems necessary to have a general understanding of what rational conversation looks like. It is an interesting situation where it knows what is bad and is able to fix it but wasn’t doing that anyways.
This is probably only going to become important once model-generated data is used for pre-training (or fine-tuning that’s functionally the same thing as continuing a pre-training run), and this process is iterated for many epochs, like with the MCTS things that play chess and Go. And you can probably just alpaca any pre-trained model you can get your hands on to start the ball rolling.
The amplifications in the papers are more ambitious this year than the last, but probably still not quite on that level. One way this could change quickly is if the plugins become a programming language, but regardless I dread visible progress by the end of the year. And once the amplification-distillation cycle gets closed, autonomous training of advanced skills becomes possible.