OK, yes, to be a perfect text predictor, or even an approximately perfect text predictor, you’d have to be very smart and smart in a very weird way. But there’s literally no reason to think that the architectures being used can ever get that good at prediction, especially not if they have to meet any realistic size constraint and/or are restricted to any realistically available amount of training input.
What we’ve seen them do so far is to generate vaguely plausible text, while making many mistakes that don’t look like the kinds of mistakes the sources of their training input would never actually make. It doesn’t follow that they can or will actually become unboundedly good predictors of humans or any other source of training data. In fact I don’t think that’s plausible at all.
It definitely fails in some cases. For example, there’s surely text on the Internet that breaks down RSA key generation, with examples. Therefore, to be a truly perfect predictor even of the sort of thing that’s already in the training data, you’d have to be able to complete the sentence “the prime factors of the hexadecimal integer 0xda52ab1517291d1032f91532c54a221a0b282f008b593072e8554c8a4d1842c7883e7eb5dc73aa68ef6b0d161d4464937f9779f805eb68dc7327ee1db7a1e7cf631911a770d29c59355ca268990daa5be746e93e1b883e8bc030df2ba94d45a88252fceaf6de89644392f91a9d437de0410e5b8e1123b9a3e05169497df2c909b73e104daf835b027d4be54f756025974e24363a372c57b46905d61605ce58918dc6fb63a92c9b4745d30ee3fc0b937f47eb3061cd317e658e6521886e51079f327bd705a074b76c94f466ad6ca77b16efb08cd92981ae27bf254b75b67fad8f336d8fdab79bc74e27773f87e80ba778d146cc6cbddc5ba7fdc21f6528303c93 are...”.
what sorts of cognitive capabilities can exist in reality, and
whether current (or future) training regimes are likely to find them
It sounds like you agree that the relevant cognitive capabilities are likely to exist, though maybe not for prime number factorization, and that it’s unclear whether they’d fit inside current architectures.
I do not read Eliezer as making a claim that future GPT-n generations will become perfect (or approximately perfect) text predictors. He is specifically rebutting claims others have made, that GPTs/etc can not become ASI, because e.g. they are “merely imitating” human text. This is not obviously true; to the extent that there exist some cognitive capabilities which are physically possible to instantiate in GPT-n model weights which can solve these prediction problems, and are within the region of possible outcomes of our training regimes (+ the data used for them), then it is possible that we will find them.
He is specifically rebutting claims others have made, that GPTs/etc can not become ASI, because e.g. they are “merely imitating” human text.
That may be, but I’m not seeing that context here. It ends up reading to me as “look how powerful a perfect predictor would be, (and? so?) if we keep training them we’re going to end up with a perfect predictor (and, I extrapolate, then we’re hosed)”.
I’m not trying to make any confident claim that GPT-whatever can’t become dangerous[1]. But I don’t think that talking about how powerful GPTs would be if they reached implausible performance levels really says anything at all about whether they’d be dangerous at plausible ones.
For that matter, even if you reached an implausible level, it’s still not obvious that you have a problem, given that the implausible capability would still be used for pure text prediction. Generating text that, say, manipulated humans and took over the world would be a prediction error, since no such text would ever arise from any of the sources being predicted. OK, unless it predicts that it’ll find its own output in the training data....
Although, even with plugins, there are a lot of kinds of non-prediction-like capabilities I’d need to see before I thought a system was obviously dangerous [2].
But I don’t think that talking about how powerful GPTs would be if they reached implausible performance levels really says anything at all about whether they’d be dangerous at plausible ones.
This seems like it’s assuming the conclusion (that reaching dangerous capabilities using these architectures is implausible). Eliezer did consider it unlikely, though GPT-4 was a negative update in that regard.
For that matter, even if you reached an implausible level, it’s still not obvious that you have a problem, given that the implausible capability would still be used for pure text prediction. Generating text that, say, manipulated humans and took over the world would be a prediction error, since no such text would ever arise from any of the sources being predicted. OK, unless it predicts that it’ll find its own output in the training data....
This seems like it’s assuming that the system ends up outer-aligned.
This seems like it’s assuming the conclusion (that reaching dangerous capabilities using these architectures is implausible).
I think that bringing up the extreme difficulty of approximately perfect prediction, with a series of very difficult examples, and treating that as interesting enough to post about, amounts to taking it for granted that it is plausible that these architectures can get very, very good at prediction.
I don’t find that plausible, and I’m sure that there are many, many other people who won’t find it plausible either, once you call their attention to the assumption. The burden of proof falls on the proponent; if Eliezer wants us to worry about it, it’s his job to make it plausible to us.
This seems like it’s assuming that the system ends up outer-aligned.
It might be. I have avoided remembering “alignment” jargon, because every time I’ve looked at it I’ve gotten the strong feeling that the whole ontology is completely wrong, and I don’t want to break my mind by internalizing it.
It assumes that it ends up doing what you were trying to train it to do. That’s not guaranteed, for sure… but on the other hand, it’s not guaranteed that it won’t. I mean, the whole line of argument assumes that it gets incredibly good at what you were trying to train it to do. And all I said was “it’s not obvious that you have a problem”. I was very careful not to say that “you don’t have a problem”.
I agree that the post makes somewhat less sense without the surrounding context (in that it was originally generated as a series of tweets, which I think were mostly responding to people making a variety of mistaken claims about the fundamental limitations of GPT/etc).
Referring back to your top-level comment:
I honestly don’t see the relevance of this.
The relevance should be clear: in the limit of capabilities, such systems could be dangerous. Whether the relevant threshold is reachable via current methods is unknown—I don’t think Eliezer thinks it’s overwhelmingly likely; I myself am uncertain. You do not need a system capable of reversing hashes for that system to be dangerous in the relevant sense. (If you disagree with the entire thesis of AI x-risk then perhaps you disagree with that, but if so, then perhaps mention that up-front, so as to save time arguing about things that aren’t actually cruxy for you?)
But there’s literally no reason to think that the architectures being used can ever get that good at prediction, especially not if they have to meet any realistic size constraint and/or are restricted to any realistically available amount of training input.
Except for the steadily-increasing capabilities they continue to display as they scale? Also my general objection to the phrase “no reason”/”no evidence”; there obviously is evidence, if you think that evidence should be screened off please argue that explicitly.
The relevance should be clear: in the limit of capabilities, such systems could be dangerous.
What I’m saying is that reaching that limit, or reaching any level qualitatively similar to that limit, via that path, is so implausible, at least to me, that I can’t see a lot of point in even devoting more than half a sentence to the possibility, let alone using it as a central hypothesis in your planning. Thus “irrelevant”.
It’s at least somewhat plausible that you could reach a level that was dangerous, but that’s very different from getting anywhere near that limit. For that matter, it’s at least plausible that you could get dangerous just by “imitation” rather than by “prediction”. So, again, why put so much attention into it?
Except for the steadily-increasing capabilities they continue to display as they scale? Also my general objection to the phrase “no reason”/”no evidence”; there obviously is evidence, if you think that evidence should be screened off please argue that explicitly.
OK, there’s not no evidence. There’s just evidence weak enough that I don’t think it’s worth remarking on.
I accept that they’ve scaled a lot better than anybody would have expected even 5 years ago. And I expect them to keep improving for a while.
But...
They’re not so opaque as all that, and they’re still just using basically pure statistics to do their prediction, and they’re still basically doing just prediction, and they’re still operating with finite resources.
When you observe something that looks like an exponential in real life, the right way to bet it is almost always that it’s really a sigmoid.
Whenever you get a significant innovation, you would expect to see a sudden ramp-up in capability, so actually seeing such a ramp-up, even if it’s bigger than you would have expected, shouldn’t cause you to update that much about the final outcome.
If I wanted to find the thing that worries me most, it’d probably be that there’s no rule that somebody building a real system has to keep the architecture pure. Even if you do start to get diminishing returns from “GPTs” and prediction, you don’t have to stop there. If you keep adding more obvious-to-only-somewhat-unintuitive elements to the architecture, you can get in at the bottoms of more sigmoids. And the effects can easily be synergistic. And what we definitely have is a lot of momentum: many smart people’s attention and a lot of money [1] at stake, plus whatever power you get from the tools already built. That kind of thing is how you get those innovations.
You forget about code. GPT often generates correct code (even Quines!) with a single rollout, this is a superhuman ability. This is what Eliezer referred to as “text that took humans many iterations over hours or days to craft”.
OK, so it’s superhuman on some tasks[1]. That’s well known. But so what? Computers have always been radically superhuman on some tasks.
As far as I can tell the point is supposed to be that predicting what will actually appear next is harder than generating just anything vaguely reasonable, and that a perfect predictor of anything that might appear next would be both amazingly powerful and very unlike a human (and, I assume, therefore dangerous). But that’s another “so what”. You’re not going to get an even approximately perfect predictor, no matter how much you try to train in that direction. You’re going to run into the limitations of the approach. So talking about how hard it is to get to be approximately perfect, or about how powerful something approximately perfect would be, isn’t really interesting.
By the way, it also generates a lot of wrong code. And I don’t find quines exclamation-point-worthy. Quines are exactly the sort of thing I’d expect it to get right, because some people are really fascinated by them and have written both tons of code for them and tons of text explaining how that code works.
Presumably, the tasks that machines have been superhuman at so far (arithmetic, chess) confer radically less power than the tasks that LLMs could become superhuman at soon (writing code, crafting business strategies, superhuman “Diplomacy” skill of outwitting people or other AIs in negotiations, etc.)
Why do you think an LLM could become superhuman at crafting business strategies or negotiating? Or even writing code? I don’t believe this is possible.
“Writing code” feels underspecified here. I think it is clear that LLM’s will be (perhaps already are) superhuman at writing some types of code for some purposes in certain contexts. What line are you trying to assert will not be crossed when you say you don’t think it’s possible for them to be superhuman at writing code?
I honestly don’t see the relevance of this.
OK, yes, to be a perfect text predictor, or even an approximately perfect text predictor, you’d have to be very smart and smart in a very weird way. But there’s literally no reason to think that the architectures being used can ever get that good at prediction, especially not if they have to meet any realistic size constraint and/or are restricted to any realistically available amount of training input.
What we’ve seen them do so far is to generate vaguely plausible text, while making many mistakes that don’t look like the kinds of mistakes the sources of their training input would never actually make. It doesn’t follow that they can or will actually become unboundedly good predictors of humans or any other source of training data. In fact I don’t think that’s plausible at all.
It definitely fails in some cases. For example, there’s surely text on the Internet that breaks down RSA key generation, with examples. Therefore, to be a truly perfect predictor even of the sort of thing that’s already in the training data, you’d have to be able to complete the sentence “the prime factors of the hexadecimal integer 0xda52ab1517291d1032f91532c54a221a0b282f008b593072e8554c8a4d1842c7883e7eb5dc73aa68ef6b0d161d4464937f9779f805eb68dc7327ee1db7a1e7cf631911a770d29c59355ca268990daa5be746e93e1b883e8bc030df2ba94d45a88252fceaf6de89644392f91a9d437de0410e5b8e1123b9a3e05169497df2c909b73e104daf835b027d4be54f756025974e24363a372c57b46905d61605ce58918dc6fb63a92c9b4745d30ee3fc0b937f47eb3061cd317e658e6521886e51079f327bd705a074b76c94f466ad6ca77b16efb08cd92981ae27bf254b75b67fad8f336d8fdab79bc74e27773f87e80ba778d146cc6cbddc5ba7fdc21f6528303c93 are...”.
You’re making a claim about both:
what sorts of cognitive capabilities can exist in reality, and
whether current (or future) training regimes are likely to find them
It sounds like you agree that the relevant cognitive capabilities are likely to exist, though maybe not for prime number factorization, and that it’s unclear whether they’d fit inside current architectures.
I do not read Eliezer as making a claim that future GPT-n generations will become perfect (or approximately perfect) text predictors. He is specifically rebutting claims others have made, that GPTs/etc can not become ASI, because e.g. they are “merely imitating” human text. This is not obviously true; to the extent that there exist some cognitive capabilities which are physically possible to instantiate in GPT-n model weights which can solve these prediction problems, and are within the region of possible outcomes of our training regimes (+ the data used for them), then it is possible that we will find them.
That may be, but I’m not seeing that context here. It ends up reading to me as “look how powerful a perfect predictor would be, (and? so?) if we keep training them we’re going to end up with a perfect predictor (and, I extrapolate, then we’re hosed)”.
I’m not trying to make any confident claim that GPT-whatever can’t become dangerous[1]. But I don’t think that talking about how powerful GPTs would be if they reached implausible performance levels really says anything at all about whether they’d be dangerous at plausible ones.
For that matter, even if you reached an implausible level, it’s still not obvious that you have a problem, given that the implausible capability would still be used for pure text prediction. Generating text that, say, manipulated humans and took over the world would be a prediction error, since no such text would ever arise from any of the sources being predicted. OK, unless it predicts that it’ll find its own output in the training data....
Although, even with plugins, there are a lot of kinds of non-prediction-like capabilities I’d need to see before I thought a system was obviously dangerous [2].
I love footnotes.
This seems like it’s assuming the conclusion (that reaching dangerous capabilities using these architectures is implausible). Eliezer did consider it unlikely, though GPT-4 was a negative update in that regard.
This seems like it’s assuming that the system ends up outer-aligned.
I think that bringing up the extreme difficulty of approximately perfect prediction, with a series of very difficult examples, and treating that as interesting enough to post about, amounts to taking it for granted that it is plausible that these architectures can get very, very good at prediction.
I don’t find that plausible, and I’m sure that there are many, many other people who won’t find it plausible either, once you call their attention to the assumption. The burden of proof falls on the proponent; if Eliezer wants us to worry about it, it’s his job to make it plausible to us.
It might be. I have avoided remembering “alignment” jargon, because every time I’ve looked at it I’ve gotten the strong feeling that the whole ontology is completely wrong, and I don’t want to break my mind by internalizing it.
It assumes that it ends up doing what you were trying to train it to do. That’s not guaranteed, for sure… but on the other hand, it’s not guaranteed that it won’t. I mean, the whole line of argument assumes that it gets incredibly good at what you were trying to train it to do. And all I said was “it’s not obvious that you have a problem”. I was very careful not to say that “you don’t have a problem”.
I agree that the post makes somewhat less sense without the surrounding context (in that it was originally generated as a series of tweets, which I think were mostly responding to people making a variety of mistaken claims about the fundamental limitations of GPT/etc).
Referring back to your top-level comment:
The relevance should be clear: in the limit of capabilities, such systems could be dangerous. Whether the relevant threshold is reachable via current methods is unknown—I don’t think Eliezer thinks it’s overwhelmingly likely; I myself am uncertain. You do not need a system capable of reversing hashes for that system to be dangerous in the relevant sense. (If you disagree with the entire thesis of AI x-risk then perhaps you disagree with that, but if so, then perhaps mention that up-front, so as to save time arguing about things that aren’t actually cruxy for you?)
Except for the steadily-increasing capabilities they continue to display as they scale? Also my general objection to the phrase “no reason”/”no evidence”; there obviously is evidence, if you think that evidence should be screened off please argue that explicitly.
What I’m saying is that reaching that limit, or reaching any level qualitatively similar to that limit, via that path, is so implausible, at least to me, that I can’t see a lot of point in even devoting more than half a sentence to the possibility, let alone using it as a central hypothesis in your planning. Thus “irrelevant”.
It’s at least somewhat plausible that you could reach a level that was dangerous, but that’s very different from getting anywhere near that limit. For that matter, it’s at least plausible that you could get dangerous just by “imitation” rather than by “prediction”. So, again, why put so much attention into it?
OK, there’s not no evidence. There’s just evidence weak enough that I don’t think it’s worth remarking on.
I accept that they’ve scaled a lot better than anybody would have expected even 5 years ago. And I expect them to keep improving for a while.
But...
They’re not so opaque as all that, and they’re still just using basically pure statistics to do their prediction, and they’re still basically doing just prediction, and they’re still operating with finite resources.
When you observe something that looks like an exponential in real life, the right way to bet it is almost always that it’s really a sigmoid.
Whenever you get a significant innovation, you would expect to see a sudden ramp-up in capability, so actually seeing such a ramp-up, even if it’s bigger than you would have expected, shouldn’t cause you to update that much about the final outcome.
If I wanted to find the thing that worries me most, it’d probably be that there’s no rule that somebody building a real system has to keep the architecture pure. Even if you do start to get diminishing returns from “GPTs” and prediction, you don’t have to stop there. If you keep adding more obvious-to-only-somewhat-unintuitive elements to the architecture, you can get in at the bottoms of more sigmoids. And the effects can easily be synergistic. And what we definitely have is a lot of momentum: many smart people’s attention and a lot of money [1] at stake, plus whatever power you get from the tools already built. That kind of thing is how you get those innovations.
Added on edit: and, maybe worse, prestige…
You forget about code. GPT often generates correct code (even Quines!) with a single rollout, this is a superhuman ability. This is what Eliezer referred to as “text that took humans many iterations over hours or days to craft”.
OK, so it’s superhuman on some tasks[1]. That’s well known. But so what? Computers have always been radically superhuman on some tasks.
As far as I can tell the point is supposed to be that predicting what will actually appear next is harder than generating just anything vaguely reasonable, and that a perfect predictor of anything that might appear next would be both amazingly powerful and very unlike a human (and, I assume, therefore dangerous). But that’s another “so what”. You’re not going to get an even approximately perfect predictor, no matter how much you try to train in that direction. You’re going to run into the limitations of the approach. So talking about how hard it is to get to be approximately perfect, or about how powerful something approximately perfect would be, isn’t really interesting.
By the way, it also generates a lot of wrong code. And I don’t find quines exclamation-point-worthy. Quines are exactly the sort of thing I’d expect it to get right, because some people are really fascinated by them and have written both tons of code for them and tons of text explaining how that code works.
Presumably, the tasks that machines have been superhuman at so far (arithmetic, chess) confer radically less power than the tasks that LLMs could become superhuman at soon (writing code, crafting business strategies, superhuman “Diplomacy” skill of outwitting people or other AIs in negotiations, etc.)
Why do you think an LLM could become superhuman at crafting business strategies or negotiating? Or even writing code? I don’t believe this is possible.
“Writing code” feels underspecified here. I think it is clear that LLM’s will be (perhaps already are) superhuman at writing some types of code for some purposes in certain contexts. What line are you trying to assert will not be crossed when you say you don’t think it’s possible for them to be superhuman at writing code?