Listening to Eliezer walk through a hypothetical fast takeoff scenario left me with the following question: Why is it assumed that humans will almost surely fail the first time at their scheme for aligning a superintelligent AI, but the superintelligent AI will almost surely succeed the first time at its scheme for achieving its nonaligned outcome?
Speaking from experience, it’s hard to manufacture things in the real world. Doubly so for anything significantly advanced. What is the rationale for assuming that a nonaligned superintelligence won’t trip up at some stage in the hypothetical “manufacture nanobots” stage of its plan?
If I apply the same assumption of initial competence extended to humanity’s attempt to align an AGI to that AGI’s competence in successfully manufacturing some agentic-increasing tool, then the most likely scenario I get is that we’ll see the first unaligned AGI’s attempt at takeoff long before it actually succeeds in destroying humanity.
A superintelligent AI could fail to achieve its goals and still get us all killed. For example, it could design a virus that kills all humans, and then find out that there were some parts of economy it did not understand, so it runs out of resources and dies, without converting the galaxy to paperclips. Nonetheless, humans remain dead.
See, this is the perfect encapsulation of what I’m saying—it could design a virus, sure. But when it didn’t understand parts of the economy, that’s all it would be—a design. Taking something from the design stage to the “physical, working product with validated processes that operate with sufficient consistency to achieve the desired outcome” is a vast, vast undertaking, one that requires intimate involvement with the physical world. Until that point is reached, it’s not a “kill all humans but fail to paperclip everyone” virus, it’s just a design concept. Nothing more. More and more I see those difficulties being elided over by hypothetical scenarios that skip straight from the design stage and presuppose that the implementation difficulties aren’t worth consideration, or that if they are they won’t serve as a valid impediment.
It’s hard to manufacture things, but it’s not that hard to do so in a way that pretty much can’t kill you. Just keep the computation that is you at somewhat of a distance from the physical experiment, and don’t make stuff that might consume the whole earth. Making a general intelligence is an extraordinary special case: if you’re actually doing it, it might self-improve and then kill you.
It is the very same rationale that stands behind assumptions like “why Stockfish won’t execute losing set of moves”—it is just that good at chess. Or better—it is just that smart when it come down to chess.
In this thought experiment the way to go is not to “i see that AGI could likely fail at this step, therefore it will fail” but to keep thinking and inventing better moves for AGI to execute, which won’t be countered as easily. It is an important part of “security mindset” and probably major reason why Eliezer speaks about lack of pessimism in the field.
There exists a diminishing returns to thinking about moves versus performing the moves and seeing the results that the physics of the universe imposes on the moves as a consequence.
Think of it like AlphaGo—if it only ever could train itself by playing Go against actual humans, it would never have become superintelligent at Go. Manufacturing is like that—you have to play with the actual world to understand bottlenecks and challenges, not a hypothetical artificially created simulation of the world. That imposes rate-of-scaling limits that are currently being discounted.
Think of it like AlphaGo—if it only ever could train itself by playing Go against actual humans, it would never have become superintelligent at Go.
This is obviously untrue in both the model-free and model-based RL senses. There are something like 30 million human Go players who can play a game in two hours. AlphaGo was trained on policy gradients from, as it happens, on the order of 30m games; so it could accumulate a similar order of games in under a day; the subset of pro games can be upweighted to provide most of the signal—and when they stop providing signal, well then, it must have reached superhuman… (For perspective, a good 0.5m or whatever professional games used to imitation-train AG came from a single Go server, which was not the most popular, and that’s why AlphaGo Master ran its pro matches on a different larger server.) Do this for a few days or weeks, and you will likely have exactly that, in a good deal less time than ‘never’, which is a rather long time. More relevantly, because you’re not making a claim about the AG architecture specifically but about all learning agents in general: with no exploration, MuZero can bootstrap its model-based self-play from somewhere in the neighborhood of hundreds/thousands of ‘real’ games (as should not be a surprise as Go rules are simple), and achieves superhuman gameplay easily by self-play inside the learned model, with little need for any good human opponents at all; even if that is 3 orders of magnitude off, it’s still within a day of human gameplay sample-size. Or consider meta-learning sim2real like Dactyl which are trained exclusively in silico on unrealistic simulations, and adapt within seconds to reality. So either way. The sample-inefficiency of DL robotics, DL, or R&D, is more of a fact about our compute-poverty than it is about the inherent necessity of interacting with the real world (which is both highly parallelizable, learnable offline, and far smaller than existing methods).
Lack of clarity when i think about this limits makes hard for me to see how end result will change if we could somehow “stop discounting” them. It seems to my that we will have to be much more elaborete in describing parameters of this thought experiment. In particular we will have to agree on deeds and real world achivments that hypothetical AI has, so we will both agree to call it AGI (like writing interesting story and making illustrations so this particular research team now have a new revenue strem from selling it online—will this make AI an AGI?). And security conditions (air-gapped server-room?). This will get us closer to understanding “the rationale”. But then your question is not about AGI but “superintelligent AI” so we will have to do elaborate describing again with new parameters. And that is what i expect Eliezer (alone and with other people) had done a lot. And look what it did to him (this is a joke but at the same time—not). So i will not be an active participant further. It is not even about a single SAI in some box: compeeting teams, people running copies (legal and not) and changing code, corporate espionage, dirty code...
Listening to Eliezer walk through a hypothetical fast takeoff scenario left me with the following question: Why is it assumed that humans will almost surely fail the first time at their scheme for aligning a superintelligent AI, but the superintelligent AI will almost surely succeed the first time at its scheme for achieving its nonaligned outcome?
Speaking from experience, it’s hard to manufacture things in the real world. Doubly so for anything significantly advanced. What is the rationale for assuming that a nonaligned superintelligence won’t trip up at some stage in the hypothetical “manufacture nanobots” stage of its plan?
If I apply the same assumption of initial competence extended to humanity’s attempt to align an AGI to that AGI’s competence in successfully manufacturing some agentic-increasing tool, then the most likely scenario I get is that we’ll see the first unaligned AGI’s attempt at takeoff long before it actually succeeds in destroying humanity.
A superintelligent AI could fail to achieve its goals and still get us all killed. For example, it could design a virus that kills all humans, and then find out that there were some parts of economy it did not understand, so it runs out of resources and dies, without converting the galaxy to paperclips. Nonetheless, humans remain dead.
See, this is the perfect encapsulation of what I’m saying—it could design a virus, sure. But when it didn’t understand parts of the economy, that’s all it would be—a design. Taking something from the design stage to the “physical, working product with validated processes that operate with sufficient consistency to achieve the desired outcome” is a vast, vast undertaking, one that requires intimate involvement with the physical world. Until that point is reached, it’s not a “kill all humans but fail to paperclip everyone” virus, it’s just a design concept. Nothing more. More and more I see those difficulties being elided over by hypothetical scenarios that skip straight from the design stage and presuppose that the implementation difficulties aren’t worth consideration, or that if they are they won’t serve as a valid impediment.
It’s hard to manufacture things, but it’s not that hard to do so in a way that pretty much can’t kill you. Just keep the computation that is you at somewhat of a distance from the physical experiment, and don’t make stuff that might consume the whole earth. Making a general intelligence is an extraordinary special case: if you’re actually doing it, it might self-improve and then kill you.
It is the very same rationale that stands behind assumptions like “why Stockfish won’t execute losing set of moves”—it is just that good at chess. Or better—it is just that smart when it come down to chess.
In this thought experiment the way to go is not to “i see that AGI could likely fail at this step, therefore it will fail” but to keep thinking and inventing better moves for AGI to execute, which won’t be countered as easily. It is an important part of “security mindset” and probably major reason why Eliezer speaks about lack of pessimism in the field.
There exists a diminishing returns to thinking about moves versus performing the moves and seeing the results that the physics of the universe imposes on the moves as a consequence.
Think of it like AlphaGo—if it only ever could train itself by playing Go against actual humans, it would never have become superintelligent at Go. Manufacturing is like that—you have to play with the actual world to understand bottlenecks and challenges, not a hypothetical artificially created simulation of the world. That imposes rate-of-scaling limits that are currently being discounted.
This is obviously untrue in both the model-free and model-based RL senses. There are something like 30 million human Go players who can play a game in two hours. AlphaGo was trained on policy gradients from, as it happens, on the order of 30m games; so it could accumulate a similar order of games in under a day; the subset of pro games can be upweighted to provide most of the signal—and when they stop providing signal, well then, it must have reached superhuman… (For perspective, a good 0.5m or whatever professional games used to imitation-train AG came from a single Go server, which was not the most popular, and that’s why AlphaGo Master ran its pro matches on a different larger server.) Do this for a few days or weeks, and you will likely have exactly that, in a good deal less time than ‘never’, which is a rather long time. More relevantly, because you’re not making a claim about the AG architecture specifically but about all learning agents in general: with no exploration, MuZero can bootstrap its model-based self-play from somewhere in the neighborhood of hundreds/thousands of ‘real’ games (as should not be a surprise as Go rules are simple), and achieves superhuman gameplay easily by self-play inside the learned model, with little need for any good human opponents at all; even if that is 3 orders of magnitude off, it’s still within a day of human gameplay sample-size. Or consider meta-learning sim2real like Dactyl which are trained exclusively in silico on unrealistic simulations, and adapt within seconds to reality. So either way. The sample-inefficiency of DL robotics, DL, or R&D, is more of a fact about our compute-poverty than it is about the inherent necessity of interacting with the real world (which is both highly parallelizable, learnable offline, and far smaller than existing methods).
Lack of clarity when i think about this limits makes hard for me to see how end result will change if we could somehow “stop discounting” them.
It seems to my that we will have to be much more elaborete in describing parameters of this thought experiment. In particular we will have to agree on deeds and real world achivments that hypothetical AI has, so we will both agree to call it AGI (like writing interesting story and making illustrations so this particular research team now have a new revenue strem from selling it online—will this make AI an AGI?). And security conditions (air-gapped server-room?). This will get us closer to understanding “the rationale”.
But then your question is not about AGI but “superintelligent AI” so we will have to do elaborate describing again with new parameters. And that is what i expect Eliezer (alone and with other people) had done a lot. And look what it did to him (this is a joke but at the same time—not). So i will not be an active participant further.
It is not even about a single SAI in some box: compeeting teams, people running copies (legal and not) and changing code, corporate espionage, dirty code...