There’s no guarantee that boxing will ensure the safety of a soft takeoff. When your boxed AI starts to become drastically smarter than a human -- 10 times --- 1000 times -- 1000000 times—the sheer enormity of the mind may slip out of human possibility to understand. All the while, a seemingly small dissonance between the AI’s goals and human values—or a small misunderstanding on our part of what goals we’ve imbued—could magnify to catastrophe as the power differential between humanity and the AI explodes post-transition.
If an AI goes through the intelligence explosion, its goals will be what orchestrates all resources (as Omohundro’s point 6 implies). If the goals of this AI does not align with human values, all we value will be lost.
If you want guarantees, find yourself another universe. “There’s no guarantee” of anything.
You’re concept of a boxed AI seems very naive and uninformed. Of course a superintelligence a million times more powerful than a human would probably be beyond the capability of a human operator to manually debug. So what? Actual boxing setups would involve highly specialized machine checkers that assure various properties about the behavior of the intelligence and its runtime, in ways that truly can’t be faked.
And boxing, by the way, means giving the AI zero power. If there is a power differential, then really by definition it is out of the box.
Regarding your last point, is is in fact possible to build an AI that is not a utility maximizer.
And boxing, by the way, means giving the AI zero power.
No, hairyfigment’s answer was entirely appropriate. Zero power would mean zero effect. Any kind of interaction with the universe means some level of power. Perhaps in the future you should say nearly zero power instead so as to avoid misunderstanding on the parts of others, as taking you literally on the “zero” is apparently “legalistic”.
As to the issues with nearly zero power:
A superintelligence with nearly zero power could turn to be a heck of a lot more power than you expect.
The incentives to tap more perceived utility by unboxing the AI or building other unboxed AIs will be huge.
Mind, I’m not arguing that there is anything wrong with boxing. What’s I’m arguing is that it’s wrong to rely only on boxing. I recommend you read some more material on AI boxing and Oracle AI. Don’t miss out on the references.
I have read all of the resources you linked to and their references, the sequences, and just about every post on the subject here on LessWrong. Most of what passes for thinking regarding AI boxing and oracles here is confused and/or fallacious.
A superintelligence with nearly zero power could turn to be a heck of a lot more power than you expect.
It would be helpful if you could point to the specific argument which convinced you of this point. For the most part every argument I’ve seen along these lines either stacks the deck against the human operator(s), or completely ignores practical and reasonable boxing techniques.
The incentives to tap more perceived utility by unboxing the AI or building other unboxed AIs will be huge.
Again, I’d love to see a citation. Having a real AGI in a box is basically a ticket to unlimited wealth and power. Why would anybody risk losing control over that by unboxing? Seriously, someone owns an AGI would be paranoid about keeping their relative advantage and spend their time strengthening the box and investing in physical security.
Actual boxing setups would involve highly specialized machine checkers that assure various properties
A fact that is only relevant if those properties can capture the desired feature. You’ll recall that defining the desired feature is a major goal of MIRI.
And boxing, by the way, means giving the AI zero power.
No it doesn’t. Giving the AI zero power to affect our behavior, in the strict sense, would mean not running it (or not letting it produce even one bit of output and not expecting any).
Regarding your last point, is is in fact possible to build an AI that is not a utility maximizer.
Look, I know the obvious rejoinder doesn’t necessarily tell us that an arbitrary AI’s utility function will attach any value to conquering the world. But the converse part of the theorem does show that world-conquering functions can work. Utility maximization today seems like the best-formalized part of human general intelligence, especially the part that CEOs would like more of. You have not, as far as I’ve seen, shown that any other approach is remotely feasible, much less likely to happen first. (It doesn’t seem like you even want to focus on uploading.) And the parent makes a stronger claim—assuming you want to say that some credible route to AGI will produce different results, despite being mathematically equivalent to some utility function.
A fact that is only relevant if those properties can capture the desired feature. You’ll recall that defining the desired feature is a major goal of MIRI.
No that presumes what is being checked against is the friendly goal system. What I’m talking about is checking that e.g. all actions being taken by the AI are in search of solutions to a compact goal description, also extracted from the machine in the form of a bayesian concept net. Then both the goal set and stochastic samplings of representative mental processes are checked by humans for anomalous behavior (and a much larger subset frequency mined to determine what’s representative).
You’re not testing that the machine obeys some as-of-yet-not-figured-out friendly goal set, but that the extracted goals and computational traces are representative, and then manually inspecting those.
Giving the AI zero power to affect our behavior, in the strict sense, would mean not running it (or not letting it produce even one bit of output and not expecting any).
That’s a legalistic definition which belongs only in philosophy debates.
Utility maximization today seems like the best-formalized part of human general intelligence
I disagree. Much of human behavior is not utility maximizing. Much of it is about fulfilling needs, which is often about eliminating conditions. You have hunger? You eliminate this condition by eating a reasonable amount of food. You do not maximize your lack of hunger by turning the whole planet into a food-generating system and force-feeding the products down your own throat.
Anyway, in my own understanding general intelligence has to do with concept formation and system 1/system 2 learned behavior. There’s not much about utility maximization there.
It doesn’t seem like you even want to focus on uploading.
Do you count intelligence augmentation as uploading? Because that’s my path throughthe singularity.
despite being mathematically equivalent to some utility function
Gah, no no no. Not every program is equal to a utility maximizer. Not if utility and utility maximization is to have any meaning at all. Sure you can take any program and call it a utility maximizer by finding some super contrived function which is maximized by the program. But if that goal system is more complex than the program that supposidly maximizes it, then all you’ve done is demonstrate the principle of overfitting a curve.
There’s no guarantee that boxing will ensure the safety of a soft takeoff. When your boxed AI starts to become drastically smarter than a human -- 10 times --- 1000 times -- 1000000 times—the sheer enormity of the mind may slip out of human possibility to understand. All the while, a seemingly small dissonance between the AI’s goals and human values—or a small misunderstanding on our part of what goals we’ve imbued—could magnify to catastrophe as the power differential between humanity and the AI explodes post-transition.
If an AI goes through the intelligence explosion, its goals will be what orchestrates all resources (as Omohundro’s point 6 implies). If the goals of this AI does not align with human values, all we value will be lost.
If you want guarantees, find yourself another universe. “There’s no guarantee” of anything.
You’re concept of a boxed AI seems very naive and uninformed. Of course a superintelligence a million times more powerful than a human would probably be beyond the capability of a human operator to manually debug. So what? Actual boxing setups would involve highly specialized machine checkers that assure various properties about the behavior of the intelligence and its runtime, in ways that truly can’t be faked.
And boxing, by the way, means giving the AI zero power. If there is a power differential, then really by definition it is out of the box.
Regarding your last point, is is in fact possible to build an AI that is not a utility maximizer.
No, hairyfigment’s answer was entirely appropriate. Zero power would mean zero effect. Any kind of interaction with the universe means some level of power. Perhaps in the future you should say nearly zero power instead so as to avoid misunderstanding on the parts of others, as taking you literally on the “zero” is apparently “legalistic”.
As to the issues with nearly zero power:
A superintelligence with nearly zero power could turn to be a heck of a lot more power than you expect.
The incentives to tap more perceived utility by unboxing the AI or building other unboxed AIs will be huge.
Mind, I’m not arguing that there is anything wrong with boxing. What’s I’m arguing is that it’s wrong to rely only on boxing. I recommend you read some more material on AI boxing and Oracle AI. Don’t miss out on the references.
I have read all of the resources you linked to and their references, the sequences, and just about every post on the subject here on LessWrong. Most of what passes for thinking regarding AI boxing and oracles here is confused and/or fallacious.
It would be helpful if you could point to the specific argument which convinced you of this point. For the most part every argument I’ve seen along these lines either stacks the deck against the human operator(s), or completely ignores practical and reasonable boxing techniques.
Again, I’d love to see a citation. Having a real AGI in a box is basically a ticket to unlimited wealth and power. Why would anybody risk losing control over that by unboxing? Seriously, someone owns an AGI would be paranoid about keeping their relative advantage and spend their time strengthening the box and investing in physical security.
A fact that is only relevant if those properties can capture the desired feature. You’ll recall that defining the desired feature is a major goal of MIRI.
No it doesn’t. Giving the AI zero power to affect our behavior, in the strict sense, would mean not running it (or not letting it produce even one bit of output and not expecting any).
Look, I know the obvious rejoinder doesn’t necessarily tell us that an arbitrary AI’s utility function will attach any value to conquering the world. But the converse part of the theorem does show that world-conquering functions can work. Utility maximization today seems like the best-formalized part of human general intelligence, especially the part that CEOs would like more of. You have not, as far as I’ve seen, shown that any other approach is remotely feasible, much less likely to happen first. (It doesn’t seem like you even want to focus on uploading.) And the parent makes a stronger claim—assuming you want to say that some credible route to AGI will produce different results, despite being mathematically equivalent to some utility function.
No that presumes what is being checked against is the friendly goal system. What I’m talking about is checking that e.g. all actions being taken by the AI are in search of solutions to a compact goal description, also extracted from the machine in the form of a bayesian concept net. Then both the goal set and stochastic samplings of representative mental processes are checked by humans for anomalous behavior (and a much larger subset frequency mined to determine what’s representative).
You’re not testing that the machine obeys some as-of-yet-not-figured-out friendly goal set, but that the extracted goals and computational traces are representative, and then manually inspecting those.
That’s a legalistic definition which belongs only in philosophy debates.
I disagree. Much of human behavior is not utility maximizing. Much of it is about fulfilling needs, which is often about eliminating conditions. You have hunger? You eliminate this condition by eating a reasonable amount of food. You do not maximize your lack of hunger by turning the whole planet into a food-generating system and force-feeding the products down your own throat.
Anyway, in my own understanding general intelligence has to do with concept formation and system 1/system 2 learned behavior. There’s not much about utility maximization there.
Do you count intelligence augmentation as uploading? Because that’s my path throughthe singularity.
Gah, no no no. Not every program is equal to a utility maximizer. Not if utility and utility maximization is to have any meaning at all. Sure you can take any program and call it a utility maximizer by finding some super contrived function which is maximized by the program. But if that goal system is more complex than the program that supposidly maximizes it, then all you’ve done is demonstrate the principle of overfitting a curve.