My sense is almost everyone here expects that we will almost certainly arrive at dangerous capabilities with something else in addition to autoregressive LLMs (at the very least RLHF which is already widely used). I don’t know what’s true in the limit (like if you throw another 30 OOMs of compute at autoregressive models), and I doubt others have super strong opinions here. To me it seems plausible you get something that does recursive self-improvement out of a large enough autoregressive LLMs, but it seems very unlikely to be the fastest way to get there.
Edit: habryka edited the parent comment to clarify and I now agree. I’m keeping this comment as is for posterity, but note the discussion below.
My sense is almost everyone here expects that we will almost certainly arrive at dangerous capabilities with something else in addition to autoregressive LLMs
This exact statement seems wrong, I’m pretty uncertain here and many other (notable) people seem uncertain too. Maybe I think “pure autoregressive LLMs (where a ton of RL isn’t doing that much work) are the first AI with dangerous capabilities” is around 30% likely.
(Assuming dangerous capabilities includes things like massively speeding up AI R&D.)
(Note that in shorter timelines, my probability on pure autoregressive LLMs goes way up. Part of my view on having only 30% on pure LLMs is just downstream of a general view like “it’s reasonably likely that this exact approach to making transformatively powerful AI isn’t the one that end up working, so AI is reasoanbly likely to look different.)
Some people (e.g. Bogdan Ionut Cirstea) think that it’s very likely that pure autoregressive LLMs go all the way through human level R&D etc. (I don’t think this is very likely, but possible.)
(TBC, I think qualitatively wildly superhuman AIs which do galaxy brained things that humans can’t understand probably requires something more than autoregressive LLMs, at least to be done at all efficiently. And this might be what is intended by “superintelligence” in the original question.)
I was including the current level of RLHF as already not qualifying as “pure autoregressive LLMs”. IMO the RLHF is doing a bunch of important work at least at current capability levels (and my guess is also will do some important work at the first dangerous capability levels).
Also, I feel like you forgot the context of the original message, which said “all the way to superintelligence”. I was calibrating my “dangerous” threshold to “superintelligence level dangerous” not “speeds up AI R&D” dangerous.
I was including the current level of RLHF as already not qualifying as “pure autoregressive LLMs”. IMO the RLHF is doing a bunch of important work at least at current capability levels (and my guess is also will do some important work at the first dangerous capability levels).
Oh, ok, I retract my claim.
Also, I feel like you forgot the context of the original message, which said “all the way to superintelligence”.
I didn’t, I provided various caveats in parentheticals about the exact level of danger.
My sense is almost everyone here expects that we will almost certainly arrive at dangerous capabilities with something else in addition to autoregressive LLMs (at the very least RLHF which is already widely used). I don’t know what’s true in the limit (like if you throw another 30 OOMs of compute at autoregressive models), and I doubt others have super strong opinions here. To me it seems plausible you get something that does recursive self-improvement out of a large enough autoregressive LLMs, but it seems very unlikely to be the fastest way to get there.
Edit: habryka edited the parent comment to clarify and I now agree. I’m keeping this comment as is for posterity, but note the discussion below.
This exact statement seems wrong, I’m pretty uncertain here and many other (notable) people seem uncertain too. Maybe I think “pure autoregressive LLMs (where a ton of RL isn’t doing that much work) are the first AI with dangerous capabilities” is around 30% likely.
(Assuming dangerous capabilities includes things like massively speeding up AI R&D.)
(Note that in shorter timelines, my probability on pure autoregressive LLMs goes way up. Part of my view on having only 30% on pure LLMs is just downstream of a general view like “it’s reasonably likely that this exact approach to making transformatively powerful AI isn’t the one that end up working, so AI is reasoanbly likely to look different.)
Some people (e.g. Bogdan Ionut Cirstea) think that it’s very likely that pure autoregressive LLMs go all the way through human level R&D etc. (I don’t think this is very likely, but possible.)
(TBC, I think qualitatively wildly superhuman AIs which do galaxy brained things that humans can’t understand probably requires something more than autoregressive LLMs, at least to be done at all efficiently. And this might be what is intended by “superintelligence” in the original question.)
I was including the current level of RLHF as already not qualifying as “pure autoregressive LLMs”. IMO the RLHF is doing a bunch of important work at least at current capability levels (and my guess is also will do some important work at the first dangerous capability levels).
Also, I feel like you forgot the context of the original message, which said “all the way to superintelligence”. I was calibrating my “dangerous” threshold to “superintelligence level dangerous” not “speeds up AI R&D” dangerous.
Oh, ok, I retract my claim.
I didn’t, I provided various caveats in parentheticals about the exact level of danger.
Oops, mea culpa, I skipped your last parenthetical when reading your comment so missed that.