Thane Ruthenis comments on How do you feel about LessWrong these days? [Open feedback thread]

Thane Ruthenis 6 Dec 2023 17:17 UTC
24 points
20
Yeah, I’m not really happy with the state of discourse on this matter either.
I think it’s not a coincidence that many of the “canonical alignment ideas” somehow don’t make any testable predictions until AI takeoff has begun. 🤔
As a proponent of an AI-risk model that does this, I acknowledge that this is an issue, and I indeed feel pretty defensive on this point. Mainly because, as @habryka pointed out and as I’d outlined before, I think there are legitimate reasons to expect no blatant evidence until it’s too late, and indeed, that’s the whole reason AI risk is such a problem. As was repeatedly stated.
So all these moves to demand immediate well-operationalized bets read a bit like tactical social attacks that are being unintentionally launched by people who ought to know better, which are effectively exploiting the territory-level insidious nature of the problem to undermine attempts to combat it, by painting the people pointing out the problem as blind believers. Like challenges that you’re set up to lose if you take them on, but which make you look bad if you turn them down.
And the above, of course, may read exactly like a defense attempt a particularly self-aware blind believer might construct. Which doesn’t inspire much self-doubt in me^[1], but it does make me feel like I’m– no, not like I’m sailing against the winds of counterevidence – like I’m playing the social game on the side that’s poised to lose it in the long run, so I should switch up to the winning side to maximize my status, even if its position is wrong.
I’m somewhat hopeful about navigating to some concrete empirical or mathematical evidence within the next couple years. But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.
(Edit, because I’m concerned I’d been too subtle there: I am not accusing anyone, and especially not @TurnTrout, of deliberately employing social tactics to undermine their opponents rather than cooperatively seeking the truth. I’m only saying that the (usually extremely reasonable) requests for well-operationalized bets effectively have this result in this particular case.
Neither am I suggesting that the position I’m defending should be immune to criticism. Empirical evidence easily tied to well-operationalized bets is usually an excellent way to resolve disagreements and establish truth. But it’s not the only one, and it just so happens that this specific position can’t field many good predictions in this field.)
1. ^
  “But of course it won’t,” you might think – which, fair enough. But what’s your policy for handling problems that really are this insidious?
- Noosphere89 7 Dec 2023 1:01 UTC
  9 points
  0
  Parent
  Your post defending the least forgiving take on alignment basically relies on a sharp/binary property of AGI, and IMO a pretty large crux is that either this property probably doesn’t exist, or if it does exist, it is not universal, and IMO I think tends to be overused.
  
  To be clear, I’m increasingly agreeing with a weak version of the hypothesis, and I also think you are somewhat correct, but IMO I dont think your stronger hypothesis is correct, and I think that the lesson of AI progress is that it’s less sharp the more tasks you want, and the more general intelligence you want, which is in opposition to your hypothesis on AI progress being sharp.
  
  But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.
  
  I actually kinda agree with you here, but unfortunately, this is very, very important, since your allies are trying to gain real-life political power over AI, and given this is extremely impactful, it is basically required for us to discuss it.
  - Thane Ruthenis 7 Dec 2023 1:43 UTC
    7 points
    4
    Parent
    I think that the lesson of AI progress is that it’s less sharp the more tasks you want, and the more general intelligence you want
    There’s a bit of “one man’s modus ponens is another’s modus tollens” going on. I assume that when you look at a new AI model, and see how it’s not doing instrumental convergence/value reflection/whatever, you interpret it as evidence against “canonical” alignment views. I interpret it as evidence that it’s not AGI yet; or sometimes, even evidence that this whole line of research isn’t AGI-complete.
    E. g., I’ve updated all the way on this in the case of LLMs. I think you can scale them a thousandfold, and it won’t give you AGI. I’m mostly in favour of doing that, too, or at least fully realizing the potential of the products already developed. Probably same for Gemini and Q*. Cool tech. (Well, there are totalitarianism concerns, I suppose.)
    I also basically agree with all the takes in the recent “AI is easy to control” post. But what I take from it isn’t “AI is safe”, it’s “the current training methods aren’t gonna give you AGI”. Because if you put a human – the only known type of entity with the kinds of cognitive capabilities we’re worrying about – into a situation isomorphic to a DL AI’s, the human would exhibit all the issues we’re worrying about.
    Like, just because something has a label of “AI” and is technically an AI doesn’t mean studying it can give you lessons about “AGI”, the scary lightcone-eating thing all the fuss is about, yeah? Any more than studying GOFAI FPS bots is going to teach you lessons about how LLMs work?
    And that the Deep Learning paradigm can probably scale to AGI doesn’t mean that studying the intermediary artefacts it’s currently producing can teach us much about the AGI it’ll eventually spit out. Any more than studying a MNIST-classifier CNN can teach you much about LLMs; any more than studying squirrel neurology can teach you much about winning moral-philosophy debates.
    That’s basically where I’m at. LLMs and such stuff is just in the entirely wrong reference class for studying “generally intelligent”/scary systems.
    - Noosphere89 7 Dec 2023 2:39 UTC
      5 points
      3
      Parent
      
      Any more than studying GOFAI FPS bots is going to teach you lessons about how LLMs work?
      
      No, but my point here is that once we increase the complexity of the domain, and require more tasks to be done, things start to smooth over, and we don’t have nearly as sharp.
      
      I suspect a big part of that is the effects of Amdahl’s law kicking in combined with Baumol’s cost disease and power law scaling, which means you are always bottlenecked on the least automatable and doable tasks, so improvements in one area like Go don’t exactly matter as much as you think.
      
      I’d say the main lesson of AI progress, one that might even have been formulatable in the 1970s-1980s days, is that compute and data were the biggest factors, by a wide margin, and these grow smoothly. Only now are algorithms starting to play a role, and even then, it’s only because of the fact that transformers turn out to be fairly terrible at generalizing or doing stuff, which is related to your claim about LLMs being not real AGI, but I think this effect is weaker than you think, and I’m sympathetic to the continuous view as well. There probably will be some discontinuities, but IMO LWers have fairly drastically overstated how discontinuous progress was, especially if we realize that a lot of the outliers were likely simpler than the real world (Though Go comes close to it, at least for it’s domain, the problem is that the domain is far too small to matter.)
      
      I assume that when you look at a new AI model, and see how it’s not doing instrumental convergence/value reflection/whatever, you interpret it as evidence against “canonical” alignment views. I interpret it as evidence that it’s not AGI yet; or sometimes, even evidence that this whole line of research isn’t AGI-complete.
      
      I think this roughly tracks how we updated, though there was a brief phase where I became more pessimistic as I learned that LLMs probably wasn’t going to scale to AGI, and broke a few of my alignment plans, but I found other reasons to be more optimistic that didn’t depend on LLMs nearly as much.
      
      My worry is that while I think it’s fine enough to update towards “it’s not going to have any impact on anything, and that’s the reason it’s safe.” I worry that this is basically defining away the possibility of safety, and thus making the model useless:
      
      I interpret it as evidence that it’s not AGI yet; or sometimes, even evidence that this whole line of research isn’t AGI-complete.
      
      I think a potential crux here is whether to expect some continuity at all, or whether there is reason to expect a discontinuous step change for AI, which is captured in this post: https://www.lesswrong.com/posts/cHJxSJ4jBmBRGtbaE/continuity-assumptions
      
      Because if you put a human – the only known type of entity with the kinds of cognitive capabilities we’re worrying about – into a situation isomorphic to a DL AI’s, the human would exhibit all the issues we’re worrying about.
      
      I basically disagree entirely with that, and I’m extremely surprised you claimed that. If we grant that we get the same circumstances to control humans as we can do for DL AIs, then alignment becomes basically trivial in my view, since human control research would have way better ability to study humans, and in particular there is no IRB/FDA or regulation to control you, which would be huge changes to how science basically works today. It may take a lot of brute force work, but I think it basically becomes trivial to align human beings if humans could be put into a situation isomorphic to DL AIs.
      - Thane Ruthenis 7 Dec 2023 4:09 UTC
        5 points
        3
        Parent
        I’d say the main lesson of AI progress, one that might even have been formulatable in the 1970s-1980s days, is that compute and data were the biggest factors
        As far as producing algorithms that are able to, once trained on a vast dataset of [A, B] samples, interpolate a valid completion B for an arbitrary prompt sampled from the distribution of A? Yes, for sure.
        As far as producing something that can genuinely generalize off-distribution, strike way outside the boundaries of interpolation? Jury’s still out.
        Like, I think my update on all the LLM stuff is “boy, who knew interpolation can get you this far?”. The concept-space sure turned out to have a lot of intricate structure that could be exploited via pure brute force.
        I basically disagree entirely with that, and I’m extremely surprised you claimed that
        Oh, I didn’t mean “if we could hook up a flesh-and-blood human (or a human upload) to the same sort of cognition-shaping setup as we subject our AIs to”. I meant “if the forward-pass of an LLM secretly simulated a human tasked with figuring out what token to output next”, but without the ML researchers being aware that it’s what’s going on, and with them still interacting with the thing as with a token-predictor. It’s a more literal interpretation of the thing sometimes called an “inner homunculus”.
        I’m well aware that the LLM training procedure is never going to result in that. I’m just saying that if it did, and if the inner homunculus became smart enough, that’d cause all the deceptive-alignment/inner-misalignment/wrapper-mind issues. And that if you’re not modeling the AI as being/having a homunculus, you’re not thinking about an AGI, so it’s no wonder the canonical AI-risk arguments fail for that system and it’s no wonder it’s basically safe.
        Noosphere89 8 Dec 2023 20:44 UTC
        −2 points
        −8
        Parent
        
        As far as producing algorithms that are able to, once trained on a vast dataset of [A, B] samples, interpolate a valid completion B for an arbitrary prompt sampled from the distribution of A? Yes, for sure.
        
        I’d say this still applies even to non-LLM architectures like RL, which is the important part, but Jacob Cannell and 1a3orn will have to clarify.
        
        As far as producing something that can genuinely generalize off-distribution, strike way outside the boundaries of interpolation? Jury’s still out.
        
        I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms, ala Eliezer, and compute is not negligible. Epoch estimates a ⁵⁰⁄₅₀ split between compute and algorithmic progress being important. Algorithmic progress will likely matter IMO, just not nearly as much as some LWers think it is.
        
        Like, I think my update on all the LLM stuff is “boy, who knew interpolation can get you this far?”. The concept-space sure turned out to have a lot of intricate structure that could be exploited via pure brute force.
        
        I definitely updated something in this direction, which is important, but I now think the AI optimist arguments are general enough to not rely on LLMs, and sometimes not even relying on a model of what future AI will look like beyond the fact that capabilities will grow, and people expect to profit from it.
        
        I’m just saying that if it did, and if the inner homunculus became smart enough, that’d cause all the deceptive-alignment/inner-misalignment/wrapper-mind issues.
        
        Not automatically, and there are potential paths to AGI like Steven Byrnes’s path to Brain-like AGI that either outright avoid deceptive alignment altogether or make it far easier to solve (the short answer is that Steven Byrnes suspects there’s a simple generator of value, so simple that it’s dozens of lines long and if that’s the case, then the corrigible alignment/value learning agent’s simplicity gap is either 0, negative, or a very small positive gap, so small that very little data is required to pick out the honest value learning agent over the deceptive aligned agent, and we have a lot of data on human values, so this is likely to be pretty easy.)
        
        And that if you’re not modeling the AI as being/having a homunculus, you’re not thinking about an AGI,
        
        I think a crux is that I think that AIs will basically always have much more white-boxness to them than any human mind, and I think that a lot of future paradigms of AI, including the ones that scale to superintelligence, that the AI control research is easier point to still mostly be true, especially since I think AI control is fundamentally very profitable and AIs have no legal rights/IRB boards to slow down control research.
        Thane Ruthenis 8 Dec 2023 23:30 UTC
        9 points
        7
        Parent
        I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms
        Mm, I think the “algorithms vs. compute” distinction here doesn’t quite cleave reality at its joints. Much as I talked about interpolation before, it’s a pretty abstract kind of interpolation: LLMs don’t literally memorize the data points, their interpolation relies on compact generative algorithms they learn (but which, I argue, are basically still bounded by the variance in the data points they’ve been shown). The problem of machine learning, then, is in finding some architecture + training-loop setup that would, over the course of training, move the ML model towards implementing some high-performance cognitive algorithms.
        It’s dramatically easier than hard-coding the algorithms by hand, yes, and the learning algorithms we do code are very simple. But you still need to figure out in which direction to “push” your model first. (Pretty sure if you threw 2023 levels of compute at a Very Deep fully-connected NN, it won’t match a modern LLM’s performance, won’t even come close.)
        So algorithms do matter. It’s just our way of picking the right algorithms consists of figuring out the right search procedure for these algorithms, then throwing as much compute as we can at it.
        So that’s where, I would argue, the sharp left turn would lie. Not in-training, when a model’s loss suddenly drops as it “groks” general intelligence. (Although that too might happen.) It would happen when the distributed optimization process of ML researchers tinkering with training loops stumbles upon a training setup that actually pushes the ML model in the direction of the basin of general intelligence. And then that model, once scaled up enough, would suddenly generalize far off-distribution. (Indeed, that’s basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off. The “main” sharp left turn happens during the architecture search, not during the training.)
        And I’m reasonably sure we’re in an agency overhang, meaning that the newborn GI would pass human intelligence in an eye-blink. (And if it won’t, it’ll likely stall at incredibly unimpressive sub-human levels, so the ML researchers will keep tinkering with the training setups until finding one that does send it over the edge. And there’s no reason whatsoever to expect it to stall again at the human level, instead of way overshooting it.)
        we have a lot of data on human values
        Which human’s values? IMO, “the AI will fall into the basin of human values” is kind of a weird reassurance, given the sheer diversity of human values – diversity that very much includes xenophobia, genocide, and petty vengeance scaled up to geopolitical scales. And stuff like RLHF designed to fit the aesthetics of modern corporations doesn’t result in deeply thoughtful cosmopolitan philosophers – it results in sycophants concerned with PR as much as with human lives, and sometimes (presumably when not properly adapted to a new model’s scale) in high-strung yanderes.
        Let’s grant the premise that the AGI’s values will be restricted to the human range (which I don’t really buy). If the quality of the sample within the human range that we pick will be as good as what GPT-4/Sydney’s masks appeared to be? Yeah, I don’t expect humans to stick around for a while after.
        jacob_cannell 9 Dec 2023 2:04 UTC
        5 points
        1
        Parent
        
        Indeed, that’s basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off.
        
        Actually I think the evidence is fairly conclusive that the human brain is a standard primate brain with the only change being nearly a few compute scale dials increased (the number of distinct gene changes is tiny—something like 12 from what I recall). There is really nothing special about the human brain other than 1.) 3x larger than expected size, and 2.) extended neotany (longer training cycle). Neuroscientists have looked extensively for other ‘secret sauce’ and we now have some confidence in a null result: no secret sauce, just much more training compute.
        Thane Ruthenis 9 Dec 2023 3:26 UTC
        4 points
        2
        Parent
        Yes, but: whales and elephants have brains several times the size of humans, and they’re yet to build an industrial civilization. I agree that hitting upon the right architecture isn’t sufficient, you also need to scale it up – but scale alone doesn’t suffice either. You need a combination of scale, and an architecture + training process that would actually transmute the greater scale into more powerful cognitive algorithms.
        Evolution stumbled upon the human/primate template brain. One of the forks of that template somehow “took off” in the sense of starting to furiously select for larger brain size. Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
        The ML-paradigm analogue would, likewise, involve researchers stumbling upon an architecture that works well at some small scales and has good returns on compute. They’ll then scale it up as far as it’d go, as they’re wont to. The result of that training run would spit out an AGI, not a mere bundle of sophisticated heuristics.
        And we have no guarantees that the practical capabilities of that AGI would be human-level, as opposed to vastly superhuman.
        (Or vastly subhuman. But if the maximum-scale training run produces a vastly subhuman AGI, the researchers would presumably go back to the drawing board, and tinker with the architectures until they selected for algorithms with better returns on intelligence per FLOPS. There’s likewise no guarantees that this higher-level selection process would somehow result in an AGI of around human level, rather than vastly overshooting it the first time they properly scale it up.)
        jacob_cannell 9 Dec 2023 6:31 UTC
        6 points
        2
        Parent
        
        Yes, but: whales and elephants have brains several times the size of humans, and they’re yet to build an industrial civilization.
        
        Size/capacity isn’t all, but In terms of the capacity which actually matters (synaptic count, and upper cortical neuron count) - from what I recall elephants are at great ape cortical capacity, not human capacity. A few specific species of whales may be at or above human cortical neuron capacity but synaptic density was still somewhat unresolved last I looked.
        
        Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
        
        Human language/culture is more the cause of our brain expansion, not just the consequence. The human brain is impressive because of its relative size and oversized cost to the human body. Elephants/whales are huge and their brains are much smaller and cheaper comparatively. Our brains grew 3x too large/expensive because it was valuable to do so. Evolution didn’t suddenly discover some new brain architecture or trick (it already had that long ago). Instead there were a number of simultaneous whole body coadapations required for larger brains and linguistic technoculture to take off: opposable thumbs, expressive vocal cords, externalized fermentation (gut is as energetically expensive as brain tissue—something had to go), and yes larger brains, etc.
        
        Language enabled a metasystems transition similar to the origin of multicelluar life. Tribes formed as new organisms by linking brains through language/culture. This is not entirely unprecedented—insects are also social organisms of course, but their tiny brains aren’t large enough for interesting world models. The resulting new human social organisms had inter generational memory that grew nearly unbounded with time and creative search capacity that scaled with tribe size.
        
        You can separate intelligence into world model knowledge (crystal intelligence) and search/planning/creativity (fluid intelligence). Humans are absolutely not special in our fluid intelligence—it is just what you’d expect for a large primate brain. Humans raised completely without language are not especially more intelligent than animals. All of our intellectual super powers are cultural. Just as each cell can store the DNA knowledge of the entire organism, each human mind ‘cell’ can store a compressed version of much of human knowledge and gains the benefits thereof.
        
        The cultural metasystems transition which is solely completely responsible for our intellectual capability is a one time qualitative shift that will never reoccur. AI will not undergo the same transition, that isn’t how these work. The main advantage of digital minds is just speed, and to a lesser extent, copying.
        jacob_cannell 9 Dec 2023 1:59 UTC
        4 points
        2
        Parent
        
        I’d say this still applies even to non-LLM architectures like RL, which is the important part, but Jacob Cannell and 1a3orn will have to clarify.
        
        We’ve basically known how to create AGI for at least a decade. AIXI outlines the 3 main components: a predictive world model, a planning engine, and a critic. The brain also clearly has these 3 main components, and even somewhat cleanly separated into modules—that’s been clear for a while.
        
        Transformers LLMs are pretty much exactly the type of generic minimal ULM arch I was pointing at in that post (I obviously couldn’t predict the name but). On a compute scaling basis GPT4 training at 1e25 flops uses perhaps a bit more than human brain training, and its clearly not quite AGI—but mainly because it’s mostly just a world model with a bit of critic: planning is still missing. But its capabilities are reasonably impressive given that the architecture is more constrained than a hypothetical more directly brain equivalent fast-weight RNN of similar size.
        
        Anyway I don’t quite agree with the characterization that these models are just ” interpolating valid completions of any arbitrary prompt sampled from the distribution”. Human intelligence also varies widely on a spectrum with tradeoffs between memorization and creativity. Current LLMs mostly aren’t as creative as the more creative humans and are more impressive in breadth of knowledge, but eh part of that could be simply that they currently completely lack the component essential for creativity? That they accomplish so much without planning/search is impressive.
        
        the short answer is that Steven Byrnes suspects there’s a simple generator of value, so simple that it’s dozens of lines long and if that’s the case,
        
        Interestingly that is closer to my position and I thought that Byrnes thought the generator of value was somewhat more complex, although are views are admittedly fairly similar in general.