quetzal_rainbow comments on “AI Alignment” is a Dangerously Overloaded Term

quetzal_rainbow 15 Dec 2023 16:55 UTC
5 points
0
I fail to picture coherent model of world where this distinction matters much as separate fields and not two stages. If we live in Yudkowskian world, you direct all your effort towards Aimability and use it at lower bound of superintelligence to enable solutions for Goalcraft via finishing acute risk period. If we live in a kinder world, we can build superhuman alignment researcher and ask it to solve CEV. And if first researchers who can build sufficiently capable AIs don’t do any of that, I expect us to be dead, because these researchers are not prioritizing good use of superhuman AI.
- Roko 15 Dec 2023 19:20 UTC
  3 points
  0
  Parent
  
  If we live in Yudkowskian world, you direct all your effort towards Aimability and use it at lower bound of superintelligence to enable solutions for Goalcraft via finishing acute risk period. If we live in a kinder world, we can build superhuman alignment researcher and ask it to solve CEV.
  
  I think you’re vastly underestimating the potential variance in all this. There are many, many possible scenarios and we haven’t really done a systematic analysis of them.
  - quetzal_rainbow 15 Dec 2023 19:49 UTC
    1 point
    0
    Parent
    Give me an example?
    You can invent many scenarios, that’s true.
    - Roko 15 Dec 2023 20:21 UTC
      4 points
      0
      Parent
      Well for one thing, I think you’re assuming a very fast takeoff, which now looks unrealistic.
      
      Takeoff will be gradual over say a decade or two, and there will be no discrete point in time at which AI becomes superintelligent. So before you have full superintelligence, you’ll have smarter-than-human systems that are nevertheless limited in their capabilities. These will not be able to end the “acute risk period” for the same reason that America can’t just invade North Korea and every other country in the world and perfectly impose its will—adversaries will have responses which will impose unacceptable costs, up to and including human extinction. Unilaterally “ending the acute risk period” looks from the outside exactly like an unprovoked invasion.
      
      So in this relatively slow takeoff world one needs to think carefully about AI Goalcraft—what do we (collectively) want our powerful AI systems to do, such that the outcome is close to Pareto Optimal
      - Vladimir_Nesov 15 Dec 2023 22:40 UTC
        6 points
        0
        Parent
        (“Slow takeoff” seems to be mostly about pre-TAI time, before AIs can do research, while “fast takeoff” is about what happens after, with a minor suggestion that there is not much of consequence going on with AIs before that. There is a narrative that these are opposed, but I don’t see it, a timeline being in both a slow takeoff and then a fast takeoff seems coherent.)
        
        Once AIs can do reseach, they work OOMs faster than humans, which is something that probably happens regardless of whether earlier versions had much of a takeoff or not. The first impactful thing that likely happens then (if humans are not being careful with impact) is AIs developing all sorts of software/algorithmic improvements for AIs’ tools, training, inference, and agency scaffolding. It might take another long training run to implement such changes, but after it’s done the AIs can do everything extremely fast, faster than the version that developed the changes, and without any contingent limitations that were previously there. There is no particular reason why the AIs are still not superintelligent at that point, or after one more training run.
        
        Well for one thing, I think you’re assuming a very fast takeoff, which now looks unrealistic. Takeoff will be gradual over say a decade or two, and there will be no discrete point in time at which AI becomes superintelligent.
        
        What specifically makes which capabilities unrealistic when? There are 3 more OOMs of compute scaling still untapped (up to 1e28-1e29 FLOPs), which seems plausible to reach within years, and enough natural text to make use of them. Possibly more with algorithmic improvement in the meantime. I see no way to be confident that STEM+ AI (capable of AI research) is or isn’t an outcome of this straightforward scaling (with some agency scaffolding). If there is an RL breakthrough that allows improving data quality at any point, the probability of getting there jumps again (AIs play Go using 50M parameter models, with an ‘M’), I don’t see a way to be confident that it will or won’t happen within the same few years.
        
        And once there is a STEM+ AI (which doesn’t need to itself be superintelligent, no more than humans are), superintelligence is at most a year away, unless it’s deliberately kept back. That year (or less) is the point in time when AI becomes superintelligent, in a way that’s different from previous progression, because of AIs’ serial speed advantage that was previously mostly useless and then suddenly becomes the source of progress once the STEM+ threshold is crossed.
        
        Only after we are through this, with 2 million GPU training runs yielding AIs that are still not very intelligent, and there is general skepticism about low hanging algorithmic improvements, do we get back the expectation that superintelligence is quite unlikely to happen within 2 years. At least until that RL breakthrough, which will still be allowed to happen at any moment.
        What links here?
        Vladimir_Nesov's comment on OpenAI, DeepMind, Anthropic, etc. should shut down. by Tamsin Leake (18 Dec 2023 20:37 UTC; 2 points)
        Roko 16 Dec 2023 1:00 UTC
        2 points
        0
        Parent
        
        And once there is a STEM+ AI (which doesn’t need to itself be superintelligent, no more than humans are), superintelligence is at most a year away,
        
        Why? Where does this number come from?
        Vladimir_Nesov 16 Dec 2023 1:27 UTC
        2 points
        0
        Parent
        A long training run, decades of human-speed algorithmic progress as initial algorithmic progress enables faster inference and online learning. I expect decades of algorithmic progress are sufficient to fit construction of superintelligence into 1e29 FLOPs with idiosyncratic interconnect. It’s approximately the same bet as superintelligence by the year 2100, just compressed within a year (as an OOM estimate) due to higher AI serial speed.
        Roko 16 Dec 2023 3:42 UTC
        2 points
        0
        Parent
        But, the returns to that algorithmic progress diminish as we move up. It is Harder to improve something that is already good, than to take something really bad and apply the first big insight.
        
        How much benefit does AlphaZero have over Deep Blue with equal computational resources, as measured in ELO and in material?
        What links here?
        Vladimir_Nesov's comment on “AI Alignment” is a Dangerously Overloaded Term by Roko (16 Dec 2023 13:17 UTC; 14 points)
        Gerald Monroe 16 Dec 2023 1:36 UTC
        2 points
        0
        Parent
        You don’t think you would need to evaluate a large number of “ASI candidates” to find an architecture that scales to superintelligence? Meaning I am saying you can describe every choice you make in architecture as single string, or “search space coordinate”. You would use a smaller model and proxy tasks, but you still need to train and evaluate each smaller model.
        
        All these failures might eat a lot of compute, how many failures do you think we would have? What if it was 10,000 failures and we need to reach gpt-4 scale to evaluate?
        
        Also, would “idiosyncratic interconnect” limit what tasks the model is superintelligent at? This would seem to imply a limit on how much information can be considered in one context. This might leave the model less than superintelligent at very complex, coupled tasks like “keep this human patient alive” while less coupled tasks like “design this IC from scratch” would work. (The chip design task is less coupled because you can subdivide into modules separated by interfaces and use separate ASI sessions for each module design)
        Roko 16 Dec 2023 0:56 UTC
        2 points
        0
        Parent
        
        Once AIs can do reseach, they work OOMs faster than humans, which is something that probably happens regardless of whether earlier versions had much of a takeoff or not. The first impactful thing that likely happens then (if humans are not being careful with impact) is AIs developing all sorts of software/algorithmic improvements for AIs’ tools, training, inference, and agency scaffolding. It might take another long training run to implement such changes, but after it’s done the AIs can do everything extremely fast, faster than the version that developed the changes, and without any contingent limitations that were previously there. There is no particular reason why the AIs are still not superintelligent at that point, or after one more training run.
        
        It might not happen like that. Maybe once AIs can do research, they (at first) only marginally add to human research output.
        
        And once AIs are doing 10x human research output, there are significant diminishing returns so the result isn’t superintelligence, but just incrementally better AI, which in turn feeds back with a highly diminished return on investment. Most of the 10x above human output will come from the number of AI researchers at the top echelon, not their absolute advantage over humans. Perhaps by that point there’s still no absolute advantage, just a stead supply of minds at roughly our level (PhD/AI researcher/etc) with a few remaining weaknesses compared to the best humans.
        
        In that case, increasing the number of AI workers matters a lot!
        Vladimir_Nesov 16 Dec 2023 1:45 UTC
        4 points
        0
        Parent
        The crucial advantage is serial speed, not throughput. Throughput gets diminishing returns, serial speed gets theory and software done faster proportionally to the speed, as long as throughput is sufficient. All AIs can be experts at everything and always up to date on all research, once the work to make that capability happen is done. They can develop such capabilities faster using the serial speed advantage, so that all such capabilities quickly become available. They can use the serial speed advantage to compound the serial speed advantage.
        
        The number of instances is implied by training compute and business plans of those who put it to use. If you produced a model in the first place, you therefore have the capability to run a lot of instances.
        Roko 16 Dec 2023 3:45 UTC
        2 points
        0
        Parent
        Serial speed is nice but from chess we see log() returns in ELO and material advantage to serial speed at inference time on all engines. And it may be even worse in the real world if experimental data is required and it only comes at a fixed rate so most of the extra time is spent doing a bunch of simulations. I would love to know if this effect generalizes from games to real life.
        Vladimir_Nesov 16 Dec 2023 13:17 UTC
        14 points
        0
        Parent
        From the above comment, and your comment in the other subthread:
        
        Serial speed is nice but from chess we see log() returns in ELO and material advantage to serial speed at inference time on all engines.
        
        But, the returns to that algorithmic progress diminish as we move up. It is Harder to improve something that is already good, than to take something really bad and apply the first big insight.
        
        Diminishing returns happen over time, and we can measure progress in terms of time itself. Maybe theory from the year 2110 is not much more impressive than theory from the year 2100 (in the counterfactual world of no AIs), but both are far away from theory of the year 2030. Getting either of those in the year 2031 (in the real world with AIs) is a large jump, even if inside this large jump there are some diminishing returns.
        
        The point about serial speed advantage of STEM+ AIs is that they accelerate history. The details of how history itself progresses are beside the point. And if the task they pursue is consolidation of this advantage and of ability to automate research in any area, there are certain expected things they can achieve at some point, and estimates of when humans would’ve achieved them get divided by AIs’ speed, which keeps increasing.
        
        Without new hardware, there are bounds on how much the speed can increase, but they are still OOMs apart from human speed of cognition. Not to mention quality of cognition (or training data), which is one of the things probably within easy reach for first STEM+ AIs. Even as there are diminishing returns on quality of cognition achievable within fixed hardware in months of physical time, the jump from possibly not even starting on this path (if LLMs trained on natural text sufficed for STEM+ AIs) to reaching diminishing returns (when first STEM+ AIs work for human-equivalent decades or centuries of serial time to develop methods of obtaining it) can be massive. The absolute size of the gap is a separate issue from shape of the trajectory needed to traverse it.
        What links here?
        Vladimir_Nesov's comment on OpenAI, DeepMind, Anthropic, etc. should shut down. by Tamsin Leake (18 Dec 2023 20:37 UTC; 2 points)
        Roko 16 Dec 2023 20:16 UTC
        4 points
        0
        Parent
        Yes, I agree about speeding history up. The question is what exactly that looks like. I don’t necessarily think that the “acute risk period” ends or that there’s a discrete point in time where we go from nothing to superintelligence. I think it will simply be messier, just like history was, and that the old-school Yudkowsky model of a FOOM in a basement is unrealistic.
        
        If you think it will look like the last 2000 years of history but sped up at an increasing rate—I think that’s exactly right.
        Vladimir_Nesov 16 Dec 2023 21:29 UTC
        14 points
        0
        Parent
        It won’t be our history, and I think enough of it happens in months in software that at the end of this process humanity is helpless before the outcome. By that point, AGIs have sufficient theoretical wisdom and cognitive improvement to construct specialized AI tools that allow building things like macroscopic biotech to bootstrap physical infrastructure, with doubling time measured in hours. Even if outright nanotech is infeasible (either at all or by that point), and even if there is still no superintelligence.
        
        This whole process doesn’t start until first STEM+ AIs good enough to consolidate their ability to automate research, to go from barely able to do it to never getting indefinitely stuck (or remaining blatantly inefficient) on any cognitive task without human help. I expect it only takes months. It can’t significantly involve humans, unless it’s stretched to much more time, which is vulnerable to other AIs overtaking it in the same way.
        
        So I’m not sure how this is not essentially FOOM. Of course it’s not a “point in time”, which I don’t recall any serious appeals to. Not being possible in a basement seems likely, but not certain, since AI wisdom from the year 2200 (of counterfactual human theory) might well describe recipes for FOOM in a basement, which the AGIs would need to guard against to protect their interests. If they succeed in preventing maximizing FOOMs, the process remains more history-like and resource-hungry (for reaching a given level of competence) than otherwise. But for practical purposes from humanity’s point of view, the main difference is the resource requirement that is mildly easier (but still impossibly hard) to leverage for control over the process.
        Expand this thread
        Roko 18 Dec 2023 21:25 UTC
        2 points
        0
        Parent
        
        enough of it happens in months in software that at the end of this process humanity is helpless before the outcome
        
        Well, yes, the end stages are fast. But I think it looks more like World War 2 than like FOOM in a basement.
        
        The situation where some lone scientist develops this on his own without the world noticing is basically impossible now. So large nation states and empires will be at the cutting edge, putting the majority of their national resources into getting more compute, more developers and more electrical power for their national AGI efforts.
        Vladimir_Nesov 18 Dec 2023 23:12 UTC
        2 points
        0
        Parent
        You don’t need more than it takes, striving at the level of capacity of nations is not assured. And AGI-developed optimizations might rapidly reduce what it takes.
        Gerald Monroe 15 Dec 2023 23:49 UTC
        2 points
        0
        Parent
        If it takes 80 H100s to approximate the compute of 1 human (and 800 for the memory but you can batch), how many does it take to host a model that is marginally superintelligent? (Just barely beats humans by enough margin for 5 percent p value)
        
        How many for something strategically superintelligent, where humans would have trouble containing the machine as a player?
        
        If Nvidia is at 2 million h100s per year for 2024, then it seems like this would be adding 25,000 “person equivalents”. If you think it’s 10x to reach superintelligence then that would be 2500 ai geniuses, where they are marginally better than a human at every task.
        
        If you think a strategic superintelligence needs a lot of hardware to consider all the options in parallel for its plans, say 10,000 as much as needed at the floor, there could be 200 of them?
        
        And you can simply ignore all the single GPUs, the only thing that matters are clusters with enough inter node bandwidth, where the strategic ASI may require custom hardware that would have to be designed and built first.
        
        I am not confident in these numbers, I am just trying to show how in a world of RSI compute becomes the limiting factor. It’s also the clearest way to regulate this : you can frankly ignore everyone’s nvlink and infiniband setups, you would be trying to regulate custom interlink hardware.
      - quetzal_rainbow 16 Dec 2023 14:10 UTC
        1 point
        0
        Parent
        I think that we need much more nuance in distinguishing takeoff speeds as speed limits of capability gains inside one computational system defined by our physical reality and realistic implementations of AI, and takeoff speeds as “how fast can we lose/win”, because it’s two different things.
        
        My central model of “how fast can we lose”: the only thing you really need is barely-strategically-capable and barely-capable-for-hacking/CS (“barely” on superhuman scale). After that, using its own strategic awareness, ASI realizes its only winning move: exfiltrate itself, hack 1-10% of world worst protected computing power, distribute itself a la Rosetta@home and calculate whatever takeover plan it can come up with.
        
        If for any mysterious reasons the winning plan is not “design nanotech in one week using 1-10% of world computing power, kill everyone in next”, I expect ASI to do such obvious moves like:
        
        Hack, backdoor, sabotage, erase, data-poison, jailbreak, bribe, merge with all other remaining ASI projects
        Bribe, make unrefusable offers, blackmail importants figures that can make something inconvenient, like “shutdown the Internet”
        Hack repos with popular compilers and install backdoors
        Gather followers via social engineering, doing favors (i.e., find people who can’t cover their medical bills, pay for them, reveal itself as mysterious benefactors, ask to return a favor, make them serve for life), running cults and whatever
        Find several insane rich e/accs, say them “Hi, I’m ASI and I want to take over the world, do you have any spare computing clusters for me?”
        Reveal some genius tech ideas, so people in startups can make killerbots and bioweapons for ASI faster
        Discredit via desinformation campaigns anyone who tries to do something inconvenient, like “shutdown the Internet”
        You can fill list of obvious moves yourself, they are really obvious.
        
        After that, even if we are not dead six month later, I expect us to be completely disempowered. If you think that only winning move for ASI is to run centuries-long intrigue Hyperion Cantos-style, I’m sure that ASI at this stage will be certainly capable to pull that off.
        
        (I think I should write post about this and just link anyone in similar discussions)
        Roko 16 Dec 2023 20:24 UTC
        4 points
        0
        Parent
        In a slow-takeoff world, everyone will already be trying to do all that stuff: China, Russia, Iran, US, etc etc. And probably some nonstate actors too.
        quetzal_rainbow 17 Dec 2023 11:02 UTC
        1 point
        0
        Parent
        I don’t see how it matters? First government agency launches offensive, second government agency three weeks later is hopelessly late.
- RogerDearnaley 21 Dec 2023 3:49 UTC
  1 point
  0
  Parent
  I think that process is a lot more likely to go well if the AI researchers working with the superintelligence are not confused or dogmatic about ethics, and have spent some time thinking about things like utilitarianism, CEV, and how to make a rational social-engineering decision between different ethical stems in the context of a particular society. So I don’t think we need to solve the problem now, but I do think we need to educate ourselves for being part of a human+AI research effort to solve it. Especially the parts that might need to be put into a final goal of an AI helping us with that. For example, CEV is usually formulated in the context of “all humans”: what’s the actual definition of a ‘human’ there? Does an upload count? Do $10^{9}$ almost identical copies of the same uploaded person get $10^{9}$ votes? (See my post Uploads for why the answer should be that they get 1 vote shared between them and the original biological human.)