Yeah, Iâm not really happy with the state of discourse on this matter either.
I think itâs not a coincidence that many of the âcanonical alignment ideasâ somehow donât make any testable predictions until AI takeoff has begun. đ¤
As a proponent of an AI-risk model that does this, I acknowledge that this is an issue, and I indeed feel pretty defensive on this point. Mainly because, as @habryka pointed out and as Iâd outlined before, I think there are legitimate reasons to expect no blatant evidence until itâs too late, and indeed, thatâs the whole reason AI risk is such a problem. As was repeatedly stated.
So all these moves to demand immediate well-operationalized bets read a bit like tactical social attacks that are being unintentionally launched by people who ought to know better, which are effectively exploiting the territory-level insidious nature of the problem to undermine attempts to combat it, by painting the people pointing out the problem as blind believers. Like challenges that youâre set up to lose if you take them on, but which make you look bad if you turn them down.
And the above, of course, may read exactly like a defense attempt a particularly self-aware blind believer might construct. Which doesnât inspire much self-doubt in me[1], but it does make me feel like Iâmâ no, not like Iâm sailing against the winds of counterevidence â like Iâm playing the social game on the side thatâs poised to lose it in the long run, so I should switch up to the winning side to maximize my status, even if its position is wrong.
Iâm somewhat hopeful about navigating to some concrete empirical or mathematical evidence within the next couple years. But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.
(Edit, because Iâm concerned Iâd been too subtle there: I am not accusing anyone, and especially not @TurnTrout, of deliberately employing social tactics to undermine their opponents rather than cooperatively seeking the truth. Iâm only saying that the (usually extremely reasonable) requests for well-operationalized bets effectively have this result in this particular case.
Neither am I suggesting that the position Iâm defending should be immune to criticism. Empirical evidence easily tied to well-operationalized bets is usually an excellent way to resolve disagreements and establish truth. But itâs not the only one, and it just so happens that this specific position canât field many good predictions in this field.)
Your post defending the least forgiving take on alignment basically relies on a sharp/âbinary property of AGI, and IMO a pretty large crux is that either this property probably doesnât exist, or if it does exist, it is not universal, and IMO I think tends to be overused.
To be clear, Iâm increasingly agreeing with a weak version of the hypothesis, and I also think you are somewhat correct, but IMO I dont think your stronger hypothesis is correct, and I think that the lesson of AI progress is that itâs less sharp the more tasks you want, and the more general intelligence you want, which is in opposition to your hypothesis on AI progress being sharp.
But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.
I actually kinda agree with you here, but unfortunately, this is very, very important, since your allies are trying to gain real-life political power over AI, and given this is extremely impactful, it is basically required for us to discuss it.
I think that the lesson of AI progress is that itâs less sharp the more tasks you want, and the more general intelligence you want
Thereâs a bit of âone manâs modus ponens is anotherâs modus tollensâ going on. I assume that when you look at a new AI model, and see how itâs not doing instrumental convergence/âvalue reflection/âwhatever, you interpret it as evidence against âcanonicalâ alignment views. I interpret it as evidence that itâs not AGI yet; or sometimes, even evidence that this whole line of research isnât AGI-complete.
E. g., Iâve updated all the way on this in the case of LLMs. I think you can scale them a thousandfold, and it wonât give you AGI. Iâm mostly in favour of doing that, too, or at least fully realizing the potential of the products already developed. Probably same for Gemini and Q*. Cool tech. (Well, there are totalitarianism concerns, I suppose.)
I also basically agree with all the takes in the recent âAI is easy to controlâ post. But what I take from it isnât âAI is safeâ, itâs âthe current training methods arenât gonna give you AGIâ. Because if you put a human â the only known type of entity with the kinds of cognitive capabilities weâre worrying about â into a situation isomorphic to a DL AIâs, the human would exhibit all the issues weâre worrying about.
Like, just because something has a label of âAIâ and is technically an AI doesnât mean studying it can give you lessons about âAGIâ, the scary lightcone-eating thing all the fuss is about, yeah? Any more than studying GOFAI FPS bots is going to teach you lessons about how LLMs work?
And that the Deep Learning paradigm can probably scale to AGI doesnât mean that studying the intermediary artefacts itâs currently producing can teach us much about the AGI itâll eventually spit out. Any more than studying a MNIST-classifier CNN can teach you much about LLMs; any more than studying squirrel neurology can teach you much about winning moral-philosophy debates.
Thatâs basically where Iâm at. LLMs and such stuff is just in the entirely wrong reference class for studying âgenerally intelligentâ/âscary systems.
Any more than studying GOFAI FPS bots is going to teach you lessons about how LLMs work?
No, but my point here is that once we increase the complexity of the domain, and require more tasks to be done, things start to smooth over, and we donât have nearly as sharp.
I suspect a big part of that is the effects of Amdahlâs law kicking in combined with Baumolâs cost disease and power law scaling, which means you are always bottlenecked on the least automatable and doable tasks, so improvements in one area like Go donât exactly matter as much as you think.
Iâd say the main lesson of AI progress, one that might even have been formulatable in the 1970s-1980s days, is that compute and data were the biggest factors, by a wide margin, and these grow smoothly. Only now are algorithms starting to play a role, and even then, itâs only because of the fact that transformers turn out to be fairly terrible at generalizing or doing stuff, which is related to your claim about LLMs being not real AGI, but I think this effect is weaker than you think, and Iâm sympathetic to the continuous view as well. There probably will be some discontinuities, but IMO LWers have fairly drastically overstated how discontinuous progress was, especially if we realize that a lot of the outliers were likely simpler than the real world (Though Go comes close to it, at least for itâs domain, the problem is that the domain is far too small to matter.)
I assume that when you look at a new AI model, and see how itâs not doing instrumental convergence/âvalue reflection/âwhatever, you interpret it as evidence against âcanonicalâ alignment views. I interpret it as evidence that itâs not AGI yet; or sometimes, even evidence that this whole line of research isnât AGI-complete.
I think this roughly tracks how we updated, though there was a brief phase where I became more pessimistic as I learned that LLMs probably wasnât going to scale to AGI, and broke a few of my alignment plans, but I found other reasons to be more optimistic that didnât depend on LLMs nearly as much.
My worry is that while I think itâs fine enough to update towards âitâs not going to have any impact on anything, and thatâs the reason itâs safe.â I worry that this is basically defining away the possibility of safety, and thus making the model useless:
I interpret it as evidence that itâs not AGI yet; or sometimes, even evidence that this whole line of research isnât AGI-complete.
Because if you put a human â the only known type of entity with the kinds of cognitive capabilities weâre worrying about â into a situation isomorphic to a DL AIâs, the human would exhibit all the issues weâre worrying about.
I basically disagree entirely with that, and Iâm extremely surprised you claimed that. If we grant that we get the same circumstances to control humans as we can do for DL AIs, then alignment becomes basically trivial in my view, since human control research would have way better ability to study humans, and in particular there is no IRB/âFDA or regulation to control you, which would be huge changes to how science basically works today. It may take a lot of brute force work, but I think it basically becomes trivial to align human beings if humans could be put into a situation isomorphic to DL AIs.
Iâd say the main lesson of AI progress, one that might even have been formulatable in the 1970s-1980s days, is that compute and data were the biggest factors
As far as producing algorithms that are able to, once trained on a vast dataset of [A, B] samples, interpolate a valid completion B for an arbitrary prompt sampled from the distribution of A? Yes, for sure.
As far as producing something that can genuinely generalize off-distribution, strike way outside the boundaries of interpolation? Juryâs still out.
Like, I think my update on all the LLM stuff is âboy, who knew interpolation can get you this far?â. The concept-space sure turned out to have a lot of intricate structure that could be exploited via pure brute force.
I basically disagree entirely with that, and Iâm extremely surprised you claimed that
Oh, I didnât mean âif we could hook up a flesh-and-blood human (or a human upload) to the same sort of cognition-shaping setup as we subject our AIs toâ. I meant âif the forward-pass of an LLM secretly simulated a human tasked with figuring out what token to output nextâ, but without the ML researchers being aware that itâs whatâs going on, and with them still interacting with the thing as with a token-predictor. Itâs a more literal interpretation of the thing sometimes called an âinner homunculusâ.
Iâm well aware that the LLM training procedure is never going to result in that. Iâm just saying that if it did, and if the inner homunculus became smart enough, thatâd cause all the deceptive-alignment/âinner-misalignment/âwrapper-mind issues. And that if youâre not modeling the AI as being/âhaving a homunculus, youâre not thinking about an AGI, so itâs no wonder the canonical AI-risk arguments fail for that system and itâs no wonder itâs basically safe.
As far as producing algorithms that are able to, once trained on a vast dataset of [A, B] samples, interpolate a valid completion B for an arbitrary prompt sampled from the distribution of A? Yes, for sure.
Iâd say this still applies even to non-LLM architectures like RL, which is the important part, but Jacob Cannell and 1a3orn will have to clarify.
As far as producing something that can genuinely generalize off-distribution, strike way outside the boundaries of interpolation? Juryâs still out.
I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms, ala Eliezer, and compute is not negligible. Epoch estimates a 50â50 split between compute and algorithmic progress being important. Algorithmic progress will likely matter IMO, just not nearly as much as some LWers think it is.
Like, I think my update on all the LLM stuff is âboy, who knew interpolation can get you this far?â. The concept-space sure turned out to have a lot of intricate structure that could be exploited via pure brute force.
I definitely updated something in this direction, which is important, but I now think the AI optimist arguments are general enough to not rely on LLMs, and sometimes not even relying on a model of what future AI will look like beyond the fact that capabilities will grow, and people expect to profit from it.
Iâm just saying that if it did, and if the inner homunculus became smart enough, thatâd cause all the deceptive-alignment/âinner-misalignment/âwrapper-mind issues.
Not automatically, and there are potential paths to AGI like Steven Byrnesâs path to Brain-like AGI that either outright avoid deceptive alignment altogether or make it far easier to solve (the short answer is that Steven Byrnes suspects thereâs a simple generator of value, so simple that itâs dozens of lines long and if thatâs the case, then the corrigible alignment/âvalue learning agentâs simplicity gap is either 0, negative, or a very small positive gap, so small that very little data is required to pick out the honest value learning agent over the deceptive aligned agent, and we have a lot of data on human values, so this is likely to be pretty easy.)
And that if youâre not modeling the AI as being/âhaving a homunculus, youâre not thinking about an AGI,
I think a crux is that I think that AIs will basically always have much more white-boxness to them than any human mind, and I think that a lot of future paradigms of AI, including the ones that scale to superintelligence, that the AI control research is easier point to still mostly be true, especially since I think AI control is fundamentally very profitable and AIs have no legal rights/âIRB boards to slow down control research.
I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms
Mm, I think the âalgorithms vs. computeâ distinction here doesnât quite cleave reality at its joints. Much as I talked about interpolation before, itâs a pretty abstract kind of interpolation: LLMs donât literally memorize the data points, their interpolation relies on compact generative algorithms they learn (but which, I argue, are basically still bounded by the variance in the data points theyâve been shown). The problem of machine learning, then, is in finding some architecture + training-loop setup that would, over the course of training, move the ML model towards implementing some high-performance cognitive algorithms.
Itâs dramatically easier than hard-coding the algorithms by hand, yes, and the learning algorithms we do code are very simple. But you still need to figure out in which direction to âpushâ your model first. (Pretty sure if you threw 2023 levels of compute at a Very Deep fully-connected NN, it wonât match a modern LLMâs performance, wonât even come close.)
So algorithms do matter. Itâs just our way of picking the right algorithms consists of figuring out the right search procedure for these algorithms, then throwing as much compute as we can at it.
So thatâs where, I would argue, the sharp left turn would lie. Not in-training, when a modelâs loss suddenly drops as it âgroksâ general intelligence. (Although that too might happen.) It would happen when the distributed optimization process of ML researchers tinkering with training loops stumbles upon a training setup that actually pushes the ML model in the direction of the basin of general intelligence. And then that model, once scaled up enough, would suddenly generalize far off-distribution. (Indeed, thatâs basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off. The âmainâ sharp left turn happens during the architecture search, not during the training.)
And Iâm reasonably sure weâre in an agency overhang, meaning that the newborn GI would pass human intelligence in an eye-blink. (And if it wonât, itâll likely stall at incredibly unimpressive sub-human levels, so the ML researchers will keep tinkering with the training setups until finding one that does send it over the edge. And thereâs no reason whatsoever to expect it to stall again at the human level, instead of way overshooting it.)
we have a lot of data on human values
Which humanâs values? IMO, âthe AI will fall into the basin of human valuesâ is kind of a weird reassurance, given the sheer diversity of human values â diversity that very much includes xenophobia, genocide, and petty vengeance scaled up to geopolitical scales. And stuff like RLHF designed to fit the aesthetics of modern corporations doesnât result in deeply thoughtful cosmopolitan philosophers â it results in sycophants concerned with PR as much as with human lives, and sometimes (presumably when not properly adapted to a new modelâs scale) in high-strung yanderes.
Letâs grant the premise that the AGIâs values will be restricted to the human range (which I donât really buy). If the quality of the sample within the human range that we pick will be as good as what GPT-4/âSydneyâs masks appeared to be? Yeah, I donât expect humans to stick around for a while after.
Indeed, thatâs basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off.
Actually I think the evidence is fairly conclusive that the human brain is a standard primate brain with the only change being nearly a few compute scale dials increased (the number of distinct gene changes is tinyâsomething like 12 from what I recall). There is really nothing special about the human brain other than 1.) 3x larger than expected size, and 2.) extended neotany (longer training cycle). Neuroscientists have looked extensively for other âsecret sauceâ and we now have some confidence in a null result: no secret sauce, just much more training compute.
Yes, but: whales and elephants have brains several times the size of humans, and theyâre yet to build an industrial civilization. I agree that hitting upon the right architecture isnât sufficient, you also need to scale it up â but scale alone doesnât suffice either. You need a combination of scale, and an architecture + training process that would actually transmute the greater scale into more powerful cognitive algorithms.
Evolution stumbled upon the human/âprimate template brain. One of the forks of that template somehow âtook offâ in the sense of starting to furiously select for larger brain size. Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
The ML-paradigm analogue would, likewise, involve researchers stumbling upon an architecture that works well at some small scales and has good returns on compute. Theyâll then scale it up as far as itâd go, as theyâre wont to. The result of that training run would spit out an AGI, not a mere bundle of sophisticated heuristics.
And we have no guarantees that the practical capabilities of that AGI would be human-level, as opposed to vastly superhuman.
(Or vastly subhuman. But if the maximum-scale training run produces a vastly subhuman AGI, the researchers would presumably go back to the drawing board, and tinker with the architectures until they selected for algorithms with better returns on intelligence per FLOPS. Thereâs likewise no guarantees that this higher-level selection process would somehow result in an AGI of around human level, rather than vastly overshooting it the first time they properly scale it up.)
Yes, but: whales and elephants have brains several times the size of humans, and theyâre yet to build an industrial civilization.
Size/âcapacity isnât all, but In terms of the capacity which actually matters (synaptic count, and upper cortical neuron count) - from what I recall elephants are at great ape cortical capacity, not human capacity. A few specific species of whales may be at or above human cortical neuron capacity but synaptic density was still somewhat unresolved last I looked.
Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
Human language/âculture is more the cause of our brain expansion, not just the consequence. The human brain is impressive because of its relative size and oversized cost to the human body. Elephants/âwhales are huge and their brains are much smaller and cheaper comparatively. Our brains grew 3x too large/âexpensive because it was valuable to do so. Evolution didnât suddenly discover some new brain architecture or trick (it already had that long ago). Instead there were a number of simultaneous whole body coadapations required for larger brains and linguistic technoculture to take off: opposable thumbs, expressive vocal cords, externalized fermentation (gut is as energetically expensive as brain tissueâsomething had to go), and yes larger brains, etc.
Language enabled a metasystems transition similar to the origin of multicelluar life. Tribes formed as new organisms by linking brains through language/âculture. This is not entirely unprecedentedâinsects are also social organisms of course, but their tiny brains arenât large enough for interesting world models. The resulting new human social organisms had inter generational memory that grew nearly unbounded with time and creative search capacity that scaled with tribe size.
You can separate intelligence into world model knowledge (crystal intelligence) and search/âplanning/âcreativity (fluid intelligence). Humans are absolutely not special in our fluid intelligenceâit is just what youâd expect for a large primate brain. Humans raised completely without language are not especially more intelligent than animals. All of our intellectual super powers are cultural. Just as each cell can store the DNA knowledge of the entire organism, each human mind âcellâ can store a compressed version of much of human knowledge and gains the benefits thereof.
The cultural metasystems transition which is solely completely responsible for our intellectual capability is a one time qualitative shift that will never reoccur. AI will not undergo the same transition, that isnât how these work. The main advantage of digital minds is just speed, and to a lesser extent, copying.
Iâd say this still applies even to non-LLM architectures like RL, which is the important part, but Jacob Cannell and 1a3orn will have to clarify.
Weâve basically known how to create AGI for at least a decade. AIXI outlines the 3 main components: a predictive world model, a planning engine, and a critic. The brain also clearly has these 3 main components, and even somewhat cleanly separated into modulesâthatâs been clear for a while.
Transformers LLMs are pretty much exactly the type of generic minimal ULM arch I was pointing at in that post (I obviously couldnât predict the name but). On a compute scaling basis GPT4 training at 1e25 flops uses perhaps a bit more than human brain training, and its clearly not quite AGIâbut mainly because itâs mostly just a world model with a bit of critic: planning is still missing. But its capabilities are reasonably impressive given that the architecture is more constrained than a hypothetical more directly brain equivalent fast-weight RNN of similar size.
Anyway I donât quite agree with the characterization that these models are just â interpolating valid completions of any arbitrary prompt sampled from the distributionâ. Human intelligence also varies widely on a spectrum with tradeoffs between memorization and creativity. Current LLMs mostly arenât as creative as the more creative humans and are more impressive in breadth of knowledge, but eh part of that could be simply that they currently completely lack the component essential for creativity? That they accomplish so much without planning/âsearch is impressive.
the short answer is that Steven Byrnes suspects thereâs a simple generator of value, so simple that itâs dozens of lines long and if thatâs the case,
Interestingly that is closer to my position and I thought that Byrnes thought the generator of value was somewhat more complex, although are views are admittedly fairly similar in general.
Yeah, Iâm not really happy with the state of discourse on this matter either.
As a proponent of an AI-risk model that does this, I acknowledge that this is an issue, and I indeed feel pretty defensive on this point. Mainly because, as @habryka pointed out and as Iâd outlined before, I think there are legitimate reasons to expect no blatant evidence until itâs too late, and indeed, thatâs the whole reason AI risk is such a problem. As was repeatedly stated.
So all these moves to demand immediate well-operationalized bets read a bit like tactical social attacks that are being unintentionally launched by people who ought to know better, which are effectively exploiting the territory-level insidious nature of the problem to undermine attempts to combat it, by painting the people pointing out the problem as blind believers. Like challenges that youâre set up to lose if you take them on, but which make you look bad if you turn them down.
And the above, of course, may read exactly like a defense attempt a particularly self-aware blind believer might construct. Which doesnât inspire much self-doubt in me[1], but it does make me feel like Iâmâ no, not like Iâm sailing against the winds of counterevidence â like Iâm playing the social game on the side thatâs poised to lose it in the long run, so I should switch up to the winning side to maximize my status, even if its position is wrong.
Iâm somewhat hopeful about navigating to some concrete empirical or mathematical evidence within the next couple years. But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.
(Edit, because Iâm concerned Iâd been too subtle there: I am not accusing anyone, and especially not @TurnTrout, of deliberately employing social tactics to undermine their opponents rather than cooperatively seeking the truth. Iâm only saying that the (usually extremely reasonable) requests for well-operationalized bets effectively have this result in this particular case.
Neither am I suggesting that the position Iâm defending should be immune to criticism. Empirical evidence easily tied to well-operationalized bets is usually an excellent way to resolve disagreements and establish truth. But itâs not the only one, and it just so happens that this specific position canât field many good predictions in this field.)
âBut of course it wonât,â you might think â which, fair enough. But whatâs your policy for handling problems that really are this insidious?
Your post defending the least forgiving take on alignment basically relies on a sharp/âbinary property of AGI, and IMO a pretty large crux is that either this property probably doesnât exist, or if it does exist, it is not universal, and IMO I think tends to be overused.
To be clear, Iâm increasingly agreeing with a weak version of the hypothesis, and I also think you are somewhat correct, but IMO I dont think your stronger hypothesis is correct, and I think that the lesson of AI progress is that itâs less sharp the more tasks you want, and the more general intelligence you want, which is in opposition to your hypothesis on AI progress being sharp.
I actually kinda agree with you here, but unfortunately, this is very, very important, since your allies are trying to gain real-life political power over AI, and given this is extremely impactful, it is basically required for us to discuss it.
Thereâs a bit of âone manâs modus ponens is anotherâs modus tollensâ going on. I assume that when you look at a new AI model, and see how itâs not doing instrumental convergence/âvalue reflection/âwhatever, you interpret it as evidence against âcanonicalâ alignment views. I interpret it as evidence that itâs not AGI yet; or sometimes, even evidence that this whole line of research isnât AGI-complete.
E. g., Iâve updated all the way on this in the case of LLMs. I think you can scale them a thousandfold, and it wonât give you AGI. Iâm mostly in favour of doing that, too, or at least fully realizing the potential of the products already developed. Probably same for Gemini and Q*. Cool tech. (Well, there are totalitarianism concerns, I suppose.)
I also basically agree with all the takes in the recent âAI is easy to controlâ post. But what I take from it isnât âAI is safeâ, itâs âthe current training methods arenât gonna give you AGIâ. Because if you put a human â the only known type of entity with the kinds of cognitive capabilities weâre worrying about â into a situation isomorphic to a DL AIâs, the human would exhibit all the issues weâre worrying about.
Like, just because something has a label of âAIâ and is technically an AI doesnât mean studying it can give you lessons about âAGIâ, the scary lightcone-eating thing all the fuss is about, yeah? Any more than studying GOFAI FPS bots is going to teach you lessons about how LLMs work?
And that the Deep Learning paradigm can probably scale to AGI doesnât mean that studying the intermediary artefacts itâs currently producing can teach us much about the AGI itâll eventually spit out. Any more than studying a MNIST-classifier CNN can teach you much about LLMs; any more than studying squirrel neurology can teach you much about winning moral-philosophy debates.
Thatâs basically where Iâm at. LLMs and such stuff is just in the entirely wrong reference class for studying âgenerally intelligentâ/âscary systems.
No, but my point here is that once we increase the complexity of the domain, and require more tasks to be done, things start to smooth over, and we donât have nearly as sharp.
I suspect a big part of that is the effects of Amdahlâs law kicking in combined with Baumolâs cost disease and power law scaling, which means you are always bottlenecked on the least automatable and doable tasks, so improvements in one area like Go donât exactly matter as much as you think.
Iâd say the main lesson of AI progress, one that might even have been formulatable in the 1970s-1980s days, is that compute and data were the biggest factors, by a wide margin, and these grow smoothly. Only now are algorithms starting to play a role, and even then, itâs only because of the fact that transformers turn out to be fairly terrible at generalizing or doing stuff, which is related to your claim about LLMs being not real AGI, but I think this effect is weaker than you think, and Iâm sympathetic to the continuous view as well. There probably will be some discontinuities, but IMO LWers have fairly drastically overstated how discontinuous progress was, especially if we realize that a lot of the outliers were likely simpler than the real world (Though Go comes close to it, at least for itâs domain, the problem is that the domain is far too small to matter.)
I think this roughly tracks how we updated, though there was a brief phase where I became more pessimistic as I learned that LLMs probably wasnât going to scale to AGI, and broke a few of my alignment plans, but I found other reasons to be more optimistic that didnât depend on LLMs nearly as much.
My worry is that while I think itâs fine enough to update towards âitâs not going to have any impact on anything, and thatâs the reason itâs safe.â I worry that this is basically defining away the possibility of safety, and thus making the model useless:
I think a potential crux here is whether to expect some continuity at all, or whether there is reason to expect a discontinuous step change for AI, which is captured in this post: https://ââwww.lesswrong.com/ââposts/ââcHJxSJ4jBmBRGtbaE/ââcontinuity-assumptions
I basically disagree entirely with that, and Iâm extremely surprised you claimed that. If we grant that we get the same circumstances to control humans as we can do for DL AIs, then alignment becomes basically trivial in my view, since human control research would have way better ability to study humans, and in particular there is no IRB/âFDA or regulation to control you, which would be huge changes to how science basically works today. It may take a lot of brute force work, but I think it basically becomes trivial to align human beings if humans could be put into a situation isomorphic to DL AIs.
As far as producing algorithms that are able to, once trained on a vast dataset of [A, B] samples, interpolate a valid completion B for an arbitrary prompt sampled from the distribution of A? Yes, for sure.
As far as producing something that can genuinely generalize off-distribution, strike way outside the boundaries of interpolation? Juryâs still out.
Like, I think my update on all the LLM stuff is âboy, who knew interpolation can get you this far?â. The concept-space sure turned out to have a lot of intricate structure that could be exploited via pure brute force.
Oh, I didnât mean âif we could hook up a flesh-and-blood human (or a human upload) to the same sort of cognition-shaping setup as we subject our AIs toâ. I meant âif the forward-pass of an LLM secretly simulated a human tasked with figuring out what token to output nextâ, but without the ML researchers being aware that itâs whatâs going on, and with them still interacting with the thing as with a token-predictor. Itâs a more literal interpretation of the thing sometimes called an âinner homunculusâ.
Iâm well aware that the LLM training procedure is never going to result in that. Iâm just saying that if it did, and if the inner homunculus became smart enough, thatâd cause all the deceptive-alignment/âinner-misalignment/âwrapper-mind issues. And that if youâre not modeling the AI as being/âhaving a homunculus, youâre not thinking about an AGI, so itâs no wonder the canonical AI-risk arguments fail for that system and itâs no wonder itâs basically safe.
Iâd say this still applies even to non-LLM architectures like RL, which is the important part, but Jacob Cannell and 1a3orn will have to clarify.
I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms, ala Eliezer, and compute is not negligible. Epoch estimates a 50â50 split between compute and algorithmic progress being important. Algorithmic progress will likely matter IMO, just not nearly as much as some LWers think it is.
I definitely updated something in this direction, which is important, but I now think the AI optimist arguments are general enough to not rely on LLMs, and sometimes not even relying on a model of what future AI will look like beyond the fact that capabilities will grow, and people expect to profit from it.
Not automatically, and there are potential paths to AGI like Steven Byrnesâs path to Brain-like AGI that either outright avoid deceptive alignment altogether or make it far easier to solve (the short answer is that Steven Byrnes suspects thereâs a simple generator of value, so simple that itâs dozens of lines long and if thatâs the case, then the corrigible alignment/âvalue learning agentâs simplicity gap is either 0, negative, or a very small positive gap, so small that very little data is required to pick out the honest value learning agent over the deceptive aligned agent, and we have a lot of data on human values, so this is likely to be pretty easy.)
I think a crux is that I think that AIs will basically always have much more white-boxness to them than any human mind, and I think that a lot of future paradigms of AI, including the ones that scale to superintelligence, that the AI control research is easier point to still mostly be true, especially since I think AI control is fundamentally very profitable and AIs have no legal rights/âIRB boards to slow down control research.
Mm, I think the âalgorithms vs. computeâ distinction here doesnât quite cleave reality at its joints. Much as I talked about interpolation before, itâs a pretty abstract kind of interpolation: LLMs donât literally memorize the data points, their interpolation relies on compact generative algorithms they learn (but which, I argue, are basically still bounded by the variance in the data points theyâve been shown). The problem of machine learning, then, is in finding some architecture + training-loop setup that would, over the course of training, move the ML model towards implementing some high-performance cognitive algorithms.
Itâs dramatically easier than hard-coding the algorithms by hand, yes, and the learning algorithms we do code are very simple. But you still need to figure out in which direction to âpushâ your model first. (Pretty sure if you threw 2023 levels of compute at a Very Deep fully-connected NN, it wonât match a modern LLMâs performance, wonât even come close.)
So algorithms do matter. Itâs just our way of picking the right algorithms consists of figuring out the right search procedure for these algorithms, then throwing as much compute as we can at it.
So thatâs where, I would argue, the sharp left turn would lie. Not in-training, when a modelâs loss suddenly drops as it âgroksâ general intelligence. (Although that too might happen.) It would happen when the distributed optimization process of ML researchers tinkering with training loops stumbles upon a training setup that actually pushes the ML model in the direction of the basin of general intelligence. And then that model, once scaled up enough, would suddenly generalize far off-distribution. (Indeed, thatâs basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off. The âmainâ sharp left turn happens during the architecture search, not during the training.)
And Iâm reasonably sure weâre in an agency overhang, meaning that the newborn GI would pass human intelligence in an eye-blink. (And if it wonât, itâll likely stall at incredibly unimpressive sub-human levels, so the ML researchers will keep tinkering with the training setups until finding one that does send it over the edge. And thereâs no reason whatsoever to expect it to stall again at the human level, instead of way overshooting it.)
Which humanâs values? IMO, âthe AI will fall into the basin of human valuesâ is kind of a weird reassurance, given the sheer diversity of human values â diversity that very much includes xenophobia, genocide, and petty vengeance scaled up to geopolitical scales. And stuff like RLHF designed to fit the aesthetics of modern corporations doesnât result in deeply thoughtful cosmopolitan philosophers â it results in sycophants concerned with PR as much as with human lives, and sometimes (presumably when not properly adapted to a new modelâs scale) in high-strung yanderes.
Letâs grant the premise that the AGIâs values will be restricted to the human range (which I donât really buy). If the quality of the sample within the human range that we pick will be as good as what GPT-4/âSydneyâs masks appeared to be? Yeah, I donât expect humans to stick around for a while after.
Actually I think the evidence is fairly conclusive that the human brain is a standard primate brain with the only change being nearly a few compute scale dials increased (the number of distinct gene changes is tinyâsomething like 12 from what I recall). There is really nothing special about the human brain other than 1.) 3x larger than expected size, and 2.) extended neotany (longer training cycle). Neuroscientists have looked extensively for other âsecret sauceâ and we now have some confidence in a null result: no secret sauce, just much more training compute.
Yes, but: whales and elephants have brains several times the size of humans, and theyâre yet to build an industrial civilization. I agree that hitting upon the right architecture isnât sufficient, you also need to scale it up â but scale alone doesnât suffice either. You need a combination of scale, and an architecture + training process that would actually transmute the greater scale into more powerful cognitive algorithms.
Evolution stumbled upon the human/âprimate template brain. One of the forks of that template somehow âtook offâ in the sense of starting to furiously select for larger brain size. Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
The ML-paradigm analogue would, likewise, involve researchers stumbling upon an architecture that works well at some small scales and has good returns on compute. Theyâll then scale it up as far as itâd go, as theyâre wont to. The result of that training run would spit out an AGI, not a mere bundle of sophisticated heuristics.
And we have no guarantees that the practical capabilities of that AGI would be human-level, as opposed to vastly superhuman.
(Or vastly subhuman. But if the maximum-scale training run produces a vastly subhuman AGI, the researchers would presumably go back to the drawing board, and tinker with the architectures until they selected for algorithms with better returns on intelligence per FLOPS. Thereâs likewise no guarantees that this higher-level selection process would somehow result in an AGI of around human level, rather than vastly overshooting it the first time they properly scale it up.)
Size/âcapacity isnât all, but In terms of the capacity which actually matters (synaptic count, and upper cortical neuron count) - from what I recall elephants are at great ape cortical capacity, not human capacity. A few specific species of whales may be at or above human cortical neuron capacity but synaptic density was still somewhat unresolved last I looked.
Human language/âculture is more the cause of our brain expansion, not just the consequence. The human brain is impressive because of its relative size and oversized cost to the human body. Elephants/âwhales are huge and their brains are much smaller and cheaper comparatively. Our brains grew 3x too large/âexpensive because it was valuable to do so. Evolution didnât suddenly discover some new brain architecture or trick (it already had that long ago). Instead there were a number of simultaneous whole body coadapations required for larger brains and linguistic technoculture to take off: opposable thumbs, expressive vocal cords, externalized fermentation (gut is as energetically expensive as brain tissueâsomething had to go), and yes larger brains, etc.
Language enabled a metasystems transition similar to the origin of multicelluar life. Tribes formed as new organisms by linking brains through language/âculture. This is not entirely unprecedentedâinsects are also social organisms of course, but their tiny brains arenât large enough for interesting world models. The resulting new human social organisms had inter generational memory that grew nearly unbounded with time and creative search capacity that scaled with tribe size.
You can separate intelligence into world model knowledge (crystal intelligence) and search/âplanning/âcreativity (fluid intelligence). Humans are absolutely not special in our fluid intelligenceâit is just what youâd expect for a large primate brain. Humans raised completely without language are not especially more intelligent than animals. All of our intellectual super powers are cultural. Just as each cell can store the DNA knowledge of the entire organism, each human mind âcellâ can store a compressed version of much of human knowledge and gains the benefits thereof.
The cultural metasystems transition which is solely completely responsible for our intellectual capability is a one time qualitative shift that will never reoccur. AI will not undergo the same transition, that isnât how these work. The main advantage of digital minds is just speed, and to a lesser extent, copying.
Weâve basically known how to create AGI for at least a decade. AIXI outlines the 3 main components: a predictive world model, a planning engine, and a critic. The brain also clearly has these 3 main components, and even somewhat cleanly separated into modulesâthatâs been clear for a while.
Transformers LLMs are pretty much exactly the type of generic minimal ULM arch I was pointing at in that post (I obviously couldnât predict the name but). On a compute scaling basis GPT4 training at 1e25 flops uses perhaps a bit more than human brain training, and its clearly not quite AGIâbut mainly because itâs mostly just a world model with a bit of critic: planning is still missing. But its capabilities are reasonably impressive given that the architecture is more constrained than a hypothetical more directly brain equivalent fast-weight RNN of similar size.
Anyway I donât quite agree with the characterization that these models are just â interpolating valid completions of any arbitrary prompt sampled from the distributionâ. Human intelligence also varies widely on a spectrum with tradeoffs between memorization and creativity. Current LLMs mostly arenât as creative as the more creative humans and are more impressive in breadth of knowledge, but eh part of that could be simply that they currently completely lack the component essential for creativity? That they accomplish so much without planning/âsearch is impressive.
Interestingly that is closer to my position and I thought that Byrnes thought the generator of value was somewhat more complex, although are views are admittedly fairly similar in general.