Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.
Do you think it would be possible to design an intelligence which could do this more reliably?
I don’t understand. Is the claim here that you can build a “decide whether the risk of botched Friendly AI is worth taking machine”, and the risk of botching such a machine is much less than the risk of botching a Friendly AI?
A FAI that includes such “Should I run?” heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.
This is the same principle as for AI’s decisions themselves, where we don’t ask AI’s designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision “Should the AI run?” resolved by designers to “No.” into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).
Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.
The machine Steve proposes might not bear as much risk of creating “living hell” by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu’s concerns.
Expected utility maximization is rational and feasible.
We should be extremely conservative about not implementing a half-baked friendly AI.
If you believe that self-improving AI is inevitable and that creating friendly AI is more difficult than creating unfriendly AI then to launch an AI that simply destroys everything as quickly as possible has a higher expected utility than doing nothing or trying to implement an AI that is not completely friendly.
The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances. To prevent those scenarios one can try to solve friendly AI, which will most likely fail (or even increase the chances of a negative singularity), or try to launch a destructive singleton with simple goals to prevent further suffering and the evolution of life elsewhere in the universe. Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.
Assuming your argument is correct, wouldn’t it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.
I’m skeptical that friendly AI is as difficult as all that because, to take an example, humans are generally considered pretty “wicked” by traditional writers and armchair philosophers, but lately we haven’t been murdering each other or deliberately going out of way to make each other’s lives miserable very often. For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I’m just human. Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that? There were a lot of things in history and science that seemed mystically complex but turned out to be formalizable in compressed ways, such as the mathematics of Darwinian population genetics. Who would have imagined that the “Secrets of Life and Creation” would be revealed like that? But they were. Could “sufficient goodness that we can be convinced the agent won’t put us through hell” also have a compact description that was clearly tractable in retrospect?
Assuming your argument is correct, wouldn’t it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.
There might be countless planets that are about to undergo an evolutionary arms race for the next few billions years resulting in a lot of suffering. It is very unlikely that there is a single source of life that is exactly on the right stage of evolution with exactly the right mind design to not only lead worthwhile lives but also get their AI technology exactly right to not turn everything into a living hell.
In case you assign negative utility to suffering, which is likely to be universally accepted to have negative utility, then given that you are an expected utility maximizer it should be a serious consideration to end all life. Because 1) agents that are an effect of evolution have complex values 2) to satisfy complex values you need to meet complex circumstances 3) complex systems can fail in complex ways 4) any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways.
For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I’m just human.
To name just one example where things could go horrible wrong. Humans are by their very nature interested in domination and sex. Our aversion against sexual exploitation is largely dependent on the memeplex of our cultural and societal circumstances. If you knew more, were smarter and could think faster you might very well realize that such an aversion is a unnecessary remnant that you can easily extinguish to open up new pathways to gain utility. That Gandhi would not agree to have his brain modified into a baby-eater is incredible naive. Given the technology people will alter their preferences and personality. Many people actually perceive their moral reservations to be limiting. It only takes some amount of insight to just overcome such limitations.
You simply can’t be sure that future won’t hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.
Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that?
Maybe not, but betting on the possibility that goodness can be easily achieved is like pulling a random AI from mind design space hoping that it turns out to be friendly.
You simply can’t be sure that future won’t hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.
Similarly, it is easier to make piles of rubble than skyscrapers. Yet—amazingly—there are plenty of skyscrapers out there. Obviously something funny is going on...
The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances.
Hang on, though. That’s still normally better than not existing at all! Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as “below zero”. Most plausible futures just aren’t likely to be that bad for the creatures in them.
Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as “below zero”. Most plausible futures just aren’t likely to be that bad for the creatures in them.
The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy. That’s the case for most people. We just tend to remember the good moments more than the rest of our life.
It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you’re going to lose the fight against the general decay. Any temporary success is just a statistical fluke.
The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy.
No, I’m not!
That’s the case for most people. We just tend to remember the good moments more than the rest of our life.
Yet most creatures would rather live than die—and they show that by choosing to live. Dying is an option—they choose not to take it.
It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you’re going to lose the fight against the general decay. Any temporary success is just a statistical fluke.
It sounds as though by now there should be nothing left but dust and decay! Evidently something is wrong with this reasoning. Evolution produces marvellous wonders—as well as entropy. Your existence is an enormous statistical fluke—but you still exist. There’s no need to be “down” about it.
Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.
Where “success” refers to obliterating yourself and all your descendants. That’s not how most Darwinian creatures usually define success. Natural selection does build creatures that want to die—but only rarely and by mistake.
Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.
Do you think it would be possible to design an intelligence which could do this more reliably?
I don’t get it. Design a Friendly AI that can better judge whether it’s worth the risk of botching the design of a Friendly AI?
ETA: I suppose your point applies to some of XiXiDu’s concerns but not others?
A lens that sees its flaws.
I don’t understand. Is the claim here that you can build a “decide whether the risk of botched Friendly AI is worth taking machine”, and the risk of botching such a machine is much less than the risk of botching a Friendly AI?
A FAI that includes such “Should I run?” heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.
This is the same principle as for AI’s decisions themselves, where we don’t ask AI’s designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision “Should the AI run?” resolved by designers to “No.” into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).
If we botched the FAI, wouldn’t we also probably have botched its ability to decide whether it should run?
Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.
The machine Steve proposes might not bear as much risk of creating “living hell” by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu’s concerns.
Here are a few premises:
Complex systems can fail in complex ways.
Destruction is easier than creation.
Expected utility maximization is rational and feasible.
We should be extremely conservative about not implementing a half-baked friendly AI.
If you believe that self-improving AI is inevitable and that creating friendly AI is more difficult than creating unfriendly AI then to launch an AI that simply destroys everything as quickly as possible has a higher expected utility than doing nothing or trying to implement an AI that is not completely friendly.
The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances. To prevent those scenarios one can try to solve friendly AI, which will most likely fail (or even increase the chances of a negative singularity), or try to launch a destructive singleton with simple goals to prevent further suffering and the evolution of life elsewhere in the universe. Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.
Assuming your argument is correct, wouldn’t it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.
I’m skeptical that friendly AI is as difficult as all that because, to take an example, humans are generally considered pretty “wicked” by traditional writers and armchair philosophers, but lately we haven’t been murdering each other or deliberately going out of way to make each other’s lives miserable very often. For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I’m just human. Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that? There were a lot of things in history and science that seemed mystically complex but turned out to be formalizable in compressed ways, such as the mathematics of Darwinian population genetics. Who would have imagined that the “Secrets of Life and Creation” would be revealed like that? But they were. Could “sufficient goodness that we can be convinced the agent won’t put us through hell” also have a compact description that was clearly tractable in retrospect?
There might be countless planets that are about to undergo an evolutionary arms race for the next few billions years resulting in a lot of suffering. It is very unlikely that there is a single source of life that is exactly on the right stage of evolution with exactly the right mind design to not only lead worthwhile lives but also get their AI technology exactly right to not turn everything into a living hell.
In case you assign negative utility to suffering, which is likely to be universally accepted to have negative utility, then given that you are an expected utility maximizer it should be a serious consideration to end all life. Because 1) agents that are an effect of evolution have complex values 2) to satisfy complex values you need to meet complex circumstances 3) complex systems can fail in complex ways 4) any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways.
To name just one example where things could go horrible wrong. Humans are by their very nature interested in domination and sex. Our aversion against sexual exploitation is largely dependent on the memeplex of our cultural and societal circumstances. If you knew more, were smarter and could think faster you might very well realize that such an aversion is a unnecessary remnant that you can easily extinguish to open up new pathways to gain utility. That Gandhi would not agree to have his brain modified into a baby-eater is incredible naive. Given the technology people will alter their preferences and personality. Many people actually perceive their moral reservations to be limiting. It only takes some amount of insight to just overcome such limitations.
You simply can’t be sure that future won’t hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.
Maybe not, but betting on the possibility that goodness can be easily achieved is like pulling a random AI from mind design space hoping that it turns out to be friendly.
Similarly, it is easier to make piles of rubble than skyscrapers. Yet—amazingly—there are plenty of skyscrapers out there. Obviously something funny is going on...
Hang on, though. That’s still normally better than not existing at all! Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as “below zero”. Most plausible futures just aren’t likely to be that bad for the creatures in them.
The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy. That’s the case for most people. We just tend to remember the good moments more than the rest of our life.
It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you’re going to lose the fight against the general decay. Any temporary success is just a statistical fluke.
No, I’m not!
Yet most creatures would rather live than die—and they show that by choosing to live. Dying is an option—they choose not to take it.
It sounds as though by now there should be nothing left but dust and decay! Evidently something is wrong with this reasoning. Evolution produces marvellous wonders—as well as entropy. Your existence is an enormous statistical fluke—but you still exist. There’s no need to be “down” about it.
For some people, this is a solved problem.
Where “success” refers to obliterating yourself and all your descendants. That’s not how most Darwinian creatures usually define success. Natural selection does build creatures that want to die—but only rarely and by mistake.