In My Childhood Role Model, Eliezer Yudkowsky says that the difference in intelligence between a village idiot and Einstein is tiny relative to the difference between a chimp and a village idiot.This seems to imply (I could be misreading) that {the time between the first AI with chimp intelligence and the first AI with village idiot intelligence} will be much larger than {the time between the first AI with village idiot intelligence and the first AI with Einstein intelligence}. If we consider GPT-2 to be roughly chimp-level, and GPT-4 to be above village idiot level, then it seems like this would predict that we’ll get an Einstein-level AI within at least the next year. This seems really unlikely and I don’t even think Eliezer currently believes this. If my interpretation of this is correct, this seems like an important prediction that he got wrong and I haven’t seen acknowledged.
So my question is: Is this a fair representation of Eliezer’s beliefs at the time? If so, has this prediction been acknowledged wrong, or was it actually not wrong and there’s something I’m missing? If the prediction was wrong, what might the implications be for fast vs slow takeoff? (Initial thoughts: If this prediction had been right, then we’d expect fast takeoff to be much more likely, because it seems like if you improve Einstein by the difference between him and a village idiot 20 times over 4 years (the time between GPT-2 and GPT-4, i.e. ~chimp level vs >village idiot level), you will definitely get a self-improving AI somewhere along the way)
(Meta: I know this seems like more of an argument than a question; the reason I put it here is that I expect someone to have an easy/obvious answer, since I haven’t really spent any time thinking about or working with AI beyond playing with GPT and watching/reading some AGI debates. I also don’t want to pollute the discourse in threads where the participants are expected to at least kind of know what they’re talking about.)
This is a very good question! I can’t speak for Eliezer, so the following are just my thoughts...
Before GPT, it seemed impossible to make a machine that is comparable to a human. In each aspect, it was either dramatically better, or dramatically worse. A calculator can multiply billion times faster than I can; but it cannot write poetry at all.
So, when thinking about gradual progress, starting at “worse than human” and ending at “better than human”, it seemed like… either the premise of gradual progress is wrong, and somewhere along the path there will be one crucial insight that will move the machine from dramatically worse than human to dramatically better than human… or if it indeed is gradual in some sense, the transition will still be super fast.
The calculator is the example of “if it is better than me, then it is way better than me”.
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
The current GPT machines are something that I have never seen before: better than humans in some aspects, worse than humans in other aspects, both in the area of processing text. I definitely would not have predicted that. Without the benefit of hindsight, it feels just as weird as a calculator that would do addition faster than humans, but multiplication slower than humans and with occasional mistakes. This simply is not how I have expected programs to behave. If someone told me that they are planning to build a GPT, I would expect it to either not work at all (more likely), or to be superintelligent (less likely). The option “it works, kinda correctly, but it’s kinda lame” was not on my radar.
I am not sure what this all means. My current best guess is that this is what “learning from humans” can cause: you can almost reach them, but cannot use this alone to surpass them.
The calculator is not observing millions of humans doing multiplication, and then trying to do something statistically similar. Instead, it has an algorithm designed from scratch that solves the mathematical tasks.
The chess and go machines, they needed a lot of data to learn, but they could generate the data themselves, playing millions of games against each other. So they needed the data, but they didn’t need the humans as a source of the data; they could generate the data faster.
The weakness of GPT is that if you already feed it the entire internet and all the books ever written, you cannot get more training data. Actually, you could, with ubiquitous eavesdropping… and someone is probably already working on this. But you still need humans as a source. You cannot teach GPT by texts generated by GPT, because unlike the chess and go, you do not have the exact rules to tell you which generated ouputs are the new winning moves, and which are nonsense.
There are of course other aspects where GPT can easily surpass humans: the sheer quantity of the text it can learn from and process. If it can write mediocre computer programs, then it can write mediocre computer programs in thousand different programming languages. If it can make puns or write poems at all, it will evaluate possible puns or rhymes million times faster, in any language. If it can match patterns, it can match patterns in the entire output of humanity; a new polymath.
The social consequences may be dramatic. Even if GPT is not able to replace a human expert, it can probably replace human beginners in many professions… but if the beginners become unemployable, where will the new experts come from? By being able to deal with more complexity, GPT can make the society more complex, perhaps in a way that we will need GPT to navigate it. Would you trust a human lawyer to interpret a 10000-page legal contract designed by GPT correctly?
And yet, I wouldn’t call GPT superhuman in the sense of “smarter than Einstein”, because it also keeps making dumb mistakes. It doesn’t seem to me that more input text or more CPU alone would fix this. (But maybe I am wrong.) If feels like some insight is needed instead. Though, that insight may turn out to be relatively trivial, like maybe just a prompt asking the GPT to reflect on its own words, or something like that. If this turns out to be true, then the distance between the village idiot and Einstein actually wasn’t that big.
Or maybe we get stuck where we are, and the only progress will come from having more CPU, in which case it may take a decade or two to reach Einstein levels. Or maybe it turns out that GPT can never be smarter in certain sense than its input texts, though this seems unlikely to me.
tl;dr—we may be one clever prompt away from Einstein, or we may need 1000× more compute, no idea
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.
The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some automatic parameter tuning, and alpha-beta search with some manually designed depth extension heuristics. Over the years, people designed better and better evaluation functions and invented various tricks to reduce the amount of work spent on unpromising branches of the game tree.
Long into the 1990s, many strong players were convinced that this approach would not scale to world championship levels, because they believed that play competitive at the world champion level required correctly dealing with various difficult strategic problems, and that working within the prevailing paradigm would only lead to engines that were even more superhuman at tactics than had been already observed, while still failing against the strongest players due to lack of strategic foresight. This proved to be wrong: classical chess programs reached massively superhuman strength on the traditional approach to chess programming, and this line of programs was completely dominant and still improving up to about the year 2019.
In 2019, a team at DeepMind showed that throwing reinforcement learning and Monte Carlo Tree Search at chess (and various other games) could produce a system playing at an even higher level than the then-current version of Stockfish running on very strong hardware. Today, the best engines use either this approach or the traditional approach to chess programming augmented by incorporation of a very lightweight neural network for accurate positional evaluation.
For Go, there was hardly any significant progress from about the early 90s to the early 2010s: programs were roughly at the level of a casual player who had studied the game for a few months. A conceptual breakthrough (the invention of Monte-Carlo Tree Search) then brought them to a level equivalent in chess maybe to a master by the mid-2010s. DeepMind’s AlphaGo system then showed in 2016 that reinforcement learning and MCTS could produce a system performing at a superhuman level when run on a very powerful computer. Today, programs based on the same principles (with some relatively minor go-specific improvements) run at substantially higher playing strength than AlphaGo on consumer hardware. The vast majority of strong players was completely convinced in 2016 that AlphaGo would not win its match against Lee Sedol (a world-class human player).
Chess programs had been superhuman at the things they were good at (spotting short tactics) for a long time before surpassing humans in general playing strength, arguably because their weaknesses improved less quickly than their strengths. Their weaknesses are in fact still in evidence today: it is not difficult to construct positions that the latest versions of LC0 or Stockfish don’t handle correctly, but it is very difficult indeed to exploit this in real games. For Go programs, similar remaining weak spots have recently been shown to be exploitable in real games (see https://goattack.far.ai/), although my understanding is that these weaknesses have now largely been patched.
I think the general lesson that AI performance at a task will be determined by the aspects of that task that the AI handles best when the AI is far below human levels and by the aspects of the task that the AI handles worst when it is at or above human level, and that this slows down perceived improvement relative to humans once the AI is massively better than humans at some task-relevant capabilities, does in my expectation carry over to some extent from narrow AI (like chess computers) to general AI (like language models). In terms of the transition from chimpanzee-level intelligence to Einstein, this means that the argument from the relatively short time span evolution took to bridge that gap is probably not as general as it might look at first sight, as chimpanzees and humans probably share similar architecture-induced cognitive gaps, whereas the bottlenecks of an AI could be very different.
This would suggest (maybe counterintuitively) that fast takeoff scenarios are more likely with cognitive architectures that are similar to humans than with very alien ones.
You cannot teach GPT by texts generated by GPT, because unlike the chess and go, you do not have the exact rules to tell you which generated ouputs are the new winning moves, and which are nonsense.
You can ask GPT which are nonsense (in various ways), with no access to ground truth, and that actually works to improve responses. This sort of approach was even used to fine-tune GPT-4 (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report).
I checked out that section but what you are saying doesn’t follow for me. The section describes fine tuning compute and optimizing scalability, how does this relate to self improvement. There is a possibility I am looking in the wrong section, I was reading was about algorithms that efficiently were predicting how ChatGPT would scale. Also I didn’t see anything about a 4-step algorithm. Anyways could you explain what you mean or where I can find the right section?
You might be looking at the section 3.1 of the main report on page 2 (of the revision 3 pdf). I’m talking about page 64, which is part of section 3.1 of System Card and not of the main report, but still within the same pdf document. (Does the page-anchored link I used not work on your system to display the correct page?)
Yes thanks, the page anchorage doesn’t work for me probably the device I am using. I just get page 1.
That is super interesting it is able to find inconsistencies and fix them, I didn’t know that they defined them as hallucinations. What would expanding the capabilities of this sort of self improvement look like? It seems necessary to have a general understanding of what rational conversation looks like. It is an interesting situation where it knows what is bad and is able to fix it but wasn’t doing that anyways.
This is probably only going to become important once model-generated data is used for pre-training (or fine-tuning that’s functionally the same thing as continuing a pre-training run), and this process is iterated for many epochs, like with the MCTS things that play chess and Go. And you can probably just alpaca any pre-trained model you can get your hands on to start the ball rolling.
The amplifications in the papers are more ambitious this year than the last, but probably still not quite on that level. One way this could change quickly is if the plugins become a programming language, but regardless I dread visible progress by the end of the year. And once the amplification-distillation cycle gets closed, autonomous training of advanced skillsbecomes possible.
Here he touched on this (“Large language models” timestamp in video description), and maybe somewhere else in video, cant seem to find it. It is much better to get it directly from him but it is 4h long so...
My attempt of summary with a bit of inference so take with dose of salt:
There is some “core” of intelligence which he expected to be relatively hard to find by experimentation (but more components than he expected are already found by experimentation/gradient descent so this is partially wrong and he afraid maybe completely wrong).
He was thinking that without full “core” intelligence is non-functional—GPT4 falsified this. It is more functional than he expected, enough to produce mess that can be perceived as human level, but not really. Probably us thinking of GPT4 as being on human level is bias? So GPT4 have impressive pieces but they don’t work in unison with each other?
This is how my (mis)interpretation of his words looks like, last parts I am least certain about. (I wonder, can it be that GPT4 already have all “core” components but just stupid, barely intelligent enough to look impressive because of training?)
So I do think that over time I have come to expect a bit more that things will hang around in a near human place and weird shit will happen as a result. And my failure review where I look back and ask — was that a predictable sort of mistake? I feel like it was to some extent maybe a case of — you’re always going to get capabilities in some order and it was much easier to visualize the endpoint where you have all the capabilities than where you have some of the capabilities. And therefore my visualizations were not dwelling enough on a space we’d predictably in retrospect have entered into later where things have some capabilities but not others and it’s weird. I do think that, in 2012, I would not have called that large language models were the way and the large language models are in some way more uncannily semi-human than what I would justly have predicted in 2012 knowing only what I knew then. But broadly speaking, yeah, I do feel like GPT-4 is already kind of hanging out for longer in a weird, near-human space than I was really visualizing. In part, that’s because it’s so incredibly hard to visualize or predict correctly in advance when it will happen, which is, in retrospect, a bias.
I think it is not necessarily correct to say that GPT-4 is above village idiot level. Comparison to humans is a convenient and intuitive framing but it can be misleading.
For example, this post argues that GPT-4 is around Raven level. Beware that this framing is also problematic but for different reasons.
I think that you are correctly stating Eliezer’s beliefs at the time but it turned out that we created a completely different kind of intelligence, so it’s mostly irrelevant now.
In my opinion, we should aspire to avoid any comparison unless it has practical relevance (e.g. economic consequences).
GPT-4 is far below village idiot level at most things a village idiot uses their brain for, despite surpassing humans at next-token prediction.
This is kinda similar to how AlphaZero is far below village idiot level at most things, despite surpassing humans at chess and go.
But it does make you think that soon we might be saying “But it’s far below village idiot level at most things, it’s merely better than humans at terraforming the solar system.”
Something like this plausibly came up in the Eliezer/Paul dialogues from 2021, but I couldn’t find it with a cursory search. Eliezer has also in various places acknowledged being wrong about what kind of results the current ML paradigm would get, which probably is a superset of this specific thing.
GPT-4 is far below village idiot level at most things a village idiot uses their brain for, despite surpassing humans at next-token prediction.
Could you give some examples? I take it that what Eliezer meant by village-idiot intelligence is less “specifically does everything a village idiot can do” and more “is as generally intelligent as a village idiot”. I feel like the list of things GPT-4 can do that a village idiot can’t would look much more indicative of general intelligence than the list of things a village idiot can do that GPT-4 can’t. (As opposed to AlphaZero, where the extent of the list is “can play some board games really well”)
I just can’t imagine anyone interacting with a village idiot and GPT-4 and concluding that the village idiot is smarter. If the average village idiot had the same capabilities as today’s GPT-4, and GPT-4 had the same capabilities as today’s village idiots, I feel like it would be immediately obvious that we hadn’t gotten village-idiot level AI yet. My thinking on this is still pretty messy though so I’m very open to having my mind changed on this.
Something like this plausibly came up in the Eliezer/Paul dialogues from 2021, but I couldn’t find it with a cursory search. Eliezer has also in various places acknowledged being wrong about what kind of results the current ML paradigm would get, which probably is a superset of this specific thing.
Just skimmed the dialogues, couldn’t find it either. I have seen Eliezer acknowledge what you said but I don’t really see how it’s related; for example, if GPT-4 had been Einstein-level then that would look good for his intelligence-gap theory but bad for his suspicion of the current ML paradigm.
The big one is obviously “make long time scale plans to navigate a complicated 3D environment, while controlling a floppy robot.”
I agree with Qumeric’s comment—the point is that the modern ML paradigm is incompatible with having a single scale for general intelligence. Even given the same amount of processing power as a human brain, modern ML would use it on a smaller model with a simpler architecture, that gets exposed to orders of magnitude more training data, and that training data would be pre-gathered text or video (or maybe a simple simulation) that could be fed in at massive rates, rather than slow real-time anything.
The intelligences this produces are hard to put on a nice linear scale leading from ants to humans.
The big one is obviously “make long time scale plans to navigate a complicated 3D environment, while controlling a floppy robot.”
This is like judging a dolphin on its tree-climbing ability and concluding it’s not as smart as a squirrel. That’s not what it was built for. In a large number of historically human domains, GPT-4 will dominate the village idiot and most other humans too.
Can you think of examples where it actually makes sense to compare GPT and the village idiot and the latter easily dominates? Language input/output is still a pretty large domain.
In My Childhood Role Model, Eliezer Yudkowsky says that the difference in intelligence between a village idiot and Einstein is tiny relative to the difference between a chimp and a village idiot.This seems to imply (I could be misreading) that {the time between the first AI with chimp intelligence and the first AI with village idiot intelligence} will be much larger than {the time between the first AI with village idiot intelligence and the first AI with Einstein intelligence}. If we consider GPT-2 to be roughly chimp-level, and GPT-4 to be above village idiot level, then it seems like this would predict that we’ll get an Einstein-level AI within at least the next year. This seems really unlikely and I don’t even think Eliezer currently believes this. If my interpretation of this is correct, this seems like an important prediction that he got wrong and I haven’t seen acknowledged.
So my question is: Is this a fair representation of Eliezer’s beliefs at the time? If so, has this prediction been acknowledged wrong, or was it actually not wrong and there’s something I’m missing? If the prediction was wrong, what might the implications be for fast vs slow takeoff? (Initial thoughts: If this prediction had been right, then we’d expect fast takeoff to be much more likely, because it seems like if you improve Einstein by the difference between him and a village idiot 20 times over 4 years (the time between GPT-2 and GPT-4, i.e. ~chimp level vs >village idiot level), you will definitely get a self-improving AI somewhere along the way)
(Meta: I know this seems like more of an argument than a question; the reason I put it here is that I expect someone to have an easy/obvious answer, since I haven’t really spent any time thinking about or working with AI beyond playing with GPT and watching/reading some AGI debates. I also don’t want to pollute the discourse in threads where the participants are expected to at least kind of know what they’re talking about.)
This is a very good question! I can’t speak for Eliezer, so the following are just my thoughts...
Before GPT, it seemed impossible to make a machine that is comparable to a human. In each aspect, it was either dramatically better, or dramatically worse. A calculator can multiply billion times faster than I can; but it cannot write poetry at all.
So, when thinking about gradual progress, starting at “worse than human” and ending at “better than human”, it seemed like… either the premise of gradual progress is wrong, and somewhere along the path there will be one crucial insight that will move the machine from dramatically worse than human to dramatically better than human… or if it indeed is gradual in some sense, the transition will still be super fast.
The calculator is the example of “if it is better than me, then it is way better than me”.
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
The current GPT machines are something that I have never seen before: better than humans in some aspects, worse than humans in other aspects, both in the area of processing text. I definitely would not have predicted that. Without the benefit of hindsight, it feels just as weird as a calculator that would do addition faster than humans, but multiplication slower than humans and with occasional mistakes. This simply is not how I have expected programs to behave. If someone told me that they are planning to build a GPT, I would expect it to either not work at all (more likely), or to be superintelligent (less likely). The option “it works, kinda correctly, but it’s kinda lame” was not on my radar.
I am not sure what this all means. My current best guess is that this is what “learning from humans” can cause: you can almost reach them, but cannot use this alone to surpass them.
The calculator is not observing millions of humans doing multiplication, and then trying to do something statistically similar. Instead, it has an algorithm designed from scratch that solves the mathematical tasks.
The chess and go machines, they needed a lot of data to learn, but they could generate the data themselves, playing millions of games against each other. So they needed the data, but they didn’t need the humans as a source of the data; they could generate the data faster.
The weakness of GPT is that if you already feed it the entire internet and all the books ever written, you cannot get more training data. Actually, you could, with ubiquitous eavesdropping… and someone is probably already working on this. But you still need humans as a source. You cannot teach GPT by texts generated by GPT, because unlike the chess and go, you do not have the exact rules to tell you which generated ouputs are the new winning moves, and which are nonsense.
There are of course other aspects where GPT can easily surpass humans: the sheer quantity of the text it can learn from and process. If it can write mediocre computer programs, then it can write mediocre computer programs in thousand different programming languages. If it can make puns or write poems at all, it will evaluate possible puns or rhymes million times faster, in any language. If it can match patterns, it can match patterns in the entire output of humanity; a new polymath.
The social consequences may be dramatic. Even if GPT is not able to replace a human expert, it can probably replace human beginners in many professions… but if the beginners become unemployable, where will the new experts come from? By being able to deal with more complexity, GPT can make the society more complex, perhaps in a way that we will need GPT to navigate it. Would you trust a human lawyer to interpret a 10000-page legal contract designed by GPT correctly?
And yet, I wouldn’t call GPT superhuman in the sense of “smarter than Einstein”, because it also keeps making dumb mistakes. It doesn’t seem to me that more input text or more CPU alone would fix this. (But maybe I am wrong.) If feels like some insight is needed instead. Though, that insight may turn out to be relatively trivial, like maybe just a prompt asking the GPT to reflect on its own words, or something like that. If this turns out to be true, then the distance between the village idiot and Einstein actually wasn’t that big.
Or maybe we get stuck where we are, and the only progress will come from having more CPU, in which case it may take a decade or two to reach Einstein levels. Or maybe it turns out that GPT can never be smarter in certain sense than its input texts, though this seems unlikely to me.
tl;dr—we may be one clever prompt away from Einstein, or we may need 1000× more compute, no idea
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.
The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some automatic parameter tuning, and alpha-beta search with some manually designed depth extension heuristics. Over the years, people designed better and better evaluation functions and invented various tricks to reduce the amount of work spent on unpromising branches of the game tree.
Long into the 1990s, many strong players were convinced that this approach would not scale to world championship levels, because they believed that play competitive at the world champion level required correctly dealing with various difficult strategic problems, and that working within the prevailing paradigm would only lead to engines that were even more superhuman at tactics than had been already observed, while still failing against the strongest players due to lack of strategic foresight. This proved to be wrong: classical chess programs reached massively superhuman strength on the traditional approach to chess programming, and this line of programs was completely dominant and still improving up to about the year 2019.
In 2019, a team at DeepMind showed that throwing reinforcement learning and Monte Carlo Tree Search at chess (and various other games) could produce a system playing at an even higher level than the then-current version of Stockfish running on very strong hardware. Today, the best engines use either this approach or the traditional approach to chess programming augmented by incorporation of a very lightweight neural network for accurate positional evaluation.
For Go, there was hardly any significant progress from about the early 90s to the early 2010s: programs were roughly at the level of a casual player who had studied the game for a few months. A conceptual breakthrough (the invention of Monte-Carlo Tree Search) then brought them to a level equivalent in chess maybe to a master by the mid-2010s. DeepMind’s AlphaGo system then showed in 2016 that reinforcement learning and MCTS could produce a system performing at a superhuman level when run on a very powerful computer. Today, programs based on the same principles (with some relatively minor go-specific improvements) run at substantially higher playing strength than AlphaGo on consumer hardware. The vast majority of strong players was completely convinced in 2016 that AlphaGo would not win its match against Lee Sedol (a world-class human player).
Chess programs had been superhuman at the things they were good at (spotting short tactics) for a long time before surpassing humans in general playing strength, arguably because their weaknesses improved less quickly than their strengths. Their weaknesses are in fact still in evidence today: it is not difficult to construct positions that the latest versions of LC0 or Stockfish don’t handle correctly, but it is very difficult indeed to exploit this in real games. For Go programs, similar remaining weak spots have recently been shown to be exploitable in real games (see https://goattack.far.ai/), although my understanding is that these weaknesses have now largely been patched.
I think the general lesson that AI performance at a task will be determined by the aspects of that task that the AI handles best when the AI is far below human levels and by the aspects of the task that the AI handles worst when it is at or above human level, and that this slows down perceived improvement relative to humans once the AI is massively better than humans at some task-relevant capabilities, does in my expectation carry over to some extent from narrow AI (like chess computers) to general AI (like language models). In terms of the transition from chimpanzee-level intelligence to Einstein, this means that the argument from the relatively short time span evolution took to bridge that gap is probably not as general as it might look at first sight, as chimpanzees and humans probably share similar architecture-induced cognitive gaps, whereas the bottlenecks of an AI could be very different.
This would suggest (maybe counterintuitively) that fast takeoff scenarios are more likely with cognitive architectures that are similar to humans than with very alien ones.
You can ask GPT which are nonsense (in various ways), with no access to ground truth, and that actually works to improve responses. This sort of approach was even used to fine-tune GPT-4 (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report).
I checked out that section but what you are saying doesn’t follow for me. The section describes fine tuning compute and optimizing scalability, how does this relate to self improvement.
There is a possibility I am looking in the wrong section, I was reading was about algorithms that efficiently were predicting how ChatGPT would scale. Also I didn’t see anything about a 4-step algorithm.
Anyways could you explain what you mean or where I can find the right section?
You might be looking at the section 3.1 of the main report on page 2 (of the revision 3 pdf). I’m talking about page 64, which is part of section 3.1 of System Card and not of the main report, but still within the same pdf document. (Does the page-anchored link I used not work on your system to display the correct page?)
Yes thanks, the page anchorage doesn’t work for me probably the device I am using. I just get page 1.
That is super interesting it is able to find inconsistencies and fix them, I didn’t know that they defined them as hallucinations. What would expanding the capabilities of this sort of self improvement look like? It seems necessary to have a general understanding of what rational conversation looks like. It is an interesting situation where it knows what is bad and is able to fix it but wasn’t doing that anyways.
This is probably only going to become important once model-generated data is used for pre-training (or fine-tuning that’s functionally the same thing as continuing a pre-training run), and this process is iterated for many epochs, like with the MCTS things that play chess and Go. And you can probably just alpaca any pre-trained model you can get your hands on to start the ball rolling.
The amplifications in the papers are more ambitious this year than the last, but probably still not quite on that level. One way this could change quickly is if the plugins become a programming language, but regardless I dread visible progress by the end of the year. And once the amplification-distillation cycle gets closed, autonomous training of advanced skills becomes possible.
Here he touched on this (“Large language models” timestamp in video description), and maybe somewhere else in video, cant seem to find it. It is much better to get it directly from him but it is 4h long so...
My attempt of summary with a bit of inference so take with dose of salt:
There is some “core” of intelligence which he expected to be relatively hard to find by experimentation (but more components than he expected are already found by experimentation/gradient descent so this is partially wrong and he afraid maybe completely wrong).
He was thinking that without full “core” intelligence is non-functional—GPT4 falsified this. It is more functional than he expected, enough to produce mess that can be perceived as human level, but not really. Probably us thinking of GPT4 as being on human level is bias? So GPT4 have impressive pieces but they don’t work in unison with each other?
This is how my (mis)interpretation of his words looks like, last parts I am least certain about. (I wonder, can it be that GPT4 already have all “core” components but just stupid, barely intelligent enough to look impressive because of training?)
From 38:58 of the podcast:
Thanks, this is exactly the kind of thing I was looking for.
I think it is not necessarily correct to say that GPT-4 is above village idiot level. Comparison to humans is a convenient and intuitive framing but it can be misleading.
For example, this post argues that GPT-4 is around Raven level. Beware that this framing is also problematic but for different reasons.
I think that you are correctly stating Eliezer’s beliefs at the time but it turned out that we created a completely different kind of intelligence, so it’s mostly irrelevant now.
In my opinion, we should aspire to avoid any comparison unless it has practical relevance (e.g. economic consequences).
GPT-4 is far below village idiot level at most things a village idiot uses their brain for, despite surpassing humans at next-token prediction.
This is kinda similar to how AlphaZero is far below village idiot level at most things, despite surpassing humans at chess and go.
But it does make you think that soon we might be saying “But it’s far below village idiot level at most things, it’s merely better than humans at terraforming the solar system.”
Something like this plausibly came up in the Eliezer/Paul dialogues from 2021, but I couldn’t find it with a cursory search. Eliezer has also in various places acknowledged being wrong about what kind of results the current ML paradigm would get, which probably is a superset of this specific thing.
Thanks for the reply.
Could you give some examples? I take it that what Eliezer meant by village-idiot intelligence is less “specifically does everything a village idiot can do” and more “is as generally intelligent as a village idiot”. I feel like the list of things GPT-4 can do that a village idiot can’t would look much more indicative of general intelligence than the list of things a village idiot can do that GPT-4 can’t. (As opposed to AlphaZero, where the extent of the list is “can play some board games really well”)
I just can’t imagine anyone interacting with a village idiot and GPT-4 and concluding that the village idiot is smarter. If the average village idiot had the same capabilities as today’s GPT-4, and GPT-4 had the same capabilities as today’s village idiots, I feel like it would be immediately obvious that we hadn’t gotten village-idiot level AI yet. My thinking on this is still pretty messy though so I’m very open to having my mind changed on this.
Just skimmed the dialogues, couldn’t find it either. I have seen Eliezer acknowledge what you said but I don’t really see how it’s related; for example, if GPT-4 had been Einstein-level then that would look good for his intelligence-gap theory but bad for his suspicion of the current ML paradigm.
The big one is obviously “make long time scale plans to navigate a complicated 3D environment, while controlling a floppy robot.”
I agree with Qumeric’s comment—the point is that the modern ML paradigm is incompatible with having a single scale for general intelligence. Even given the same amount of processing power as a human brain, modern ML would use it on a smaller model with a simpler architecture, that gets exposed to orders of magnitude more training data, and that training data would be pre-gathered text or video (or maybe a simple simulation) that could be fed in at massive rates, rather than slow real-time anything.
The intelligences this produces are hard to put on a nice linear scale leading from ants to humans.
This is like judging a dolphin on its tree-climbing ability and concluding it’s not as smart as a squirrel. That’s not what it was built for. In a large number of historically human domains, GPT-4 will dominate the village idiot and most other humans too.
Can you think of examples where it actually makes sense to compare GPT and the village idiot and the latter easily dominates? Language input/output is still a pretty large domain.