There’s a value, call it “weak friendliness”, that I view as a prerequisite to politics: it’s a function that humans already implement successfully, and is the one that says “I don’t want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips”.
A hands-off AI overlord can prevent all of that, while still letting humanity squabble over gay rights and which religion is correct.
And, well, the whole point of an AI is that it’s smarter than us, and thus has a chance of solving harder problems.
[weak friendliness is] a function that humans already implement successfully
I’m not sure this is true in any useful sense. Louis XIV probably agrees with me that “I don’t want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips.”
But I think is is pretty clear than the Sun King was not implementing my moral preferences, and I am not implementing his. Either one of us is not “weak friendly” or “weak friendly” is barely powerful enough to answer really easy moral questions like “should I commit mass murder for no reason at all?” (Hint: no).
If weak friendly morality is really that weak, then I have no confidence that a weak-FAI would be able to make a strong-FAI, or even would want to. In other words, I suspect that what most people mean by weak friendly is highly generalized applause lights that widely diverging values could agree with without any actual agreement on which actions are more moral.
I think a lower bound on weak friendliness is whether or not entities living within the society consider their lives worthwhile. Of course this opens up debate about house elves and such but it’s a useful starting point.
That (along with this semi-recent exchange) reminds me of a stupid idea I had for a group decision process a while back.
Party A dislikes the status quo. To change it, they declare to the sysop that they would rather die than accept it.
The sysop accepts this and publicly announces a provisionally scheduled change.
Party B objects to the change and declares that they’d rather die than accept A’s change.
If neither party backs down, a coin is flipped and the “winner” is asked to kill the loser in order for their preference to be realized; face-to-face to make it as difficult as possible, thereby maximizing the chances of one party or the other backing down.
If the parties consist of multiple individuals, the estimated weakest-willed person on the majority side has to kill (or convince to forfeit) the weakest person on the minority side; then the next-weakest, until the minority side is eliminated. If they can’t or won’t, then they’re out of the fight, and replaced with the next-weakest person, et cetera until the minority is eliminated or the majority becomes the minority.
Basically, formalized war, only done in the opposite way of the strawman version in A Taste of Armageddon; making actual killing more difficult rather than easier.
A few reasons it’s stupid:
People will tolerate conditions much worse than death (for themselves, or for others unable to self-advocate) rather than violate the taboo against killing or against “threatening” suicide.
The system may make bad social organizations worse by removing the most socially enlightened and active people first.
People have values outside themselves, so they’ll stay alive and try to work for change rather than dying pointlessly and leaving things to presumably get worse and worse from their perspective.
Prompting people to kill or die for their values will galvanize them and make reconciliation less likely.
Real policy questions aren’t binary, and how a question is framed or what order questions are considered in will probably strongly affect the outcome and who lives or dies, which will further affect future outcomes.
A side might win after initially taking casualties, or even be vindicated a long time after their initial battle. They’d want their people back, but keeping backups of people killed in a battle would make “killing” them much easier psychologically. It might also put them at risk of being restored in a dystopia that no longer respects their right to die. (Of course, people might still be reconstructed from records and others’ memories even if they weren’t stored anywhere in their entirety.)
The system assumes that there’s a well-defined notion of an individual by which groups can be counted, and that individuals can’t be created at will to try to outnumber opponents (possibly relevant: 1, 2, 3, 4).
People will immediately reject the system, so the first thing anyone “votes” for will be to abolish it, regardless of how much worse the result might be.
If there’s an afterlife (i.e. simulation hypothesis), we might just be passing the buck.
I’m not sure it’s a good idea to even public(al)ly discuss things like this.
Actually, I think I’m now remembering a better (or better-sounding) idea that occurred to me later: rather than something as extreme as deletion, let people “vote” by agreeing to be deinstantiated, giving up the resources that would have been spent instantiating them. It might be essentially the same as death if they stayed that way til the end of the universe, but it wouldn’t be as ugly. Maybe they could be periodically awakened if someone wants to try to persuade them to change or withdraw their vote.
That would hopefully keep people from voting selfishly or without thorough consideration. On the other hand, it might insulate them from the consequences of poor policies.
Also, how to count votes is still a problem; where would “the resources that would have been spent instantiating them” come from? Is this a socialist world where everyone is entitled to a certain income, and if so, what happens when population outstrips resources? Or, in a laissez-faire world where people can run out of money and be deinstantiated, the idea amounts to plain old selling of votes to the rich, like we have now.
Basically, both my ideas seem to require a eutopia already in place, or at least a genuine 100% monopoly on force. I think that might be my point. Or maybe it’s that a simple-sounding, socially acceptable idea like “If someone would rather die than tolerate the status quo, that’s bad, and the status quo should be changed” isn’t socially acceptable once you actually go into details and/or strip away the human assumptions.
Can this be set up in a round robin fashion with sets of mutually exclusive values such that everyone who is willing to kill for their values kills each other?
Maybe if the winning side’s values mandated their own deaths. But then it would be pointless for the sysop to respond to their threat of suicide to begin with, so I don’t know. I’m not sure if there’s something you’re getting at that I’m not seeing.
“I’m not going to live there. There’s no place for me there… any more than there is for you. Malcolm… I’m a monster.What I do is evil. I have no illusions about it, but it must be done. ”
The Operative, from Serenity. (On the off-chance that somebody isn’t familiar with that quote.)
I don’t think the system works in the first place without a monopoly on lethal force. You could work within the system by “voting” for his death, but then his friends (if any) get a chance to join in the vote, and their friends, til you pretty much have a new war going. (That’s another flaw in the system I could have mentioned.)
I think the vast majority of the population would agree that genocide and mass murder are bad, same as wire heading and turning the earth in to paperclips. A single exception isn’t terribly noteworthy—I’m sure there’s at least a few pro-wire-heading people out there, and I’m sure at least a few people have gotten enraged enough at humanity to think paperclips would be a better use of the space.
If you have a reason to suspect that “mass murder” is a common preference, that’s another matter.
Is the Sun King (who doesn’t particularly desire pointless mass murder) more moral than I am? Much harder, and your articulation of “weak Friendliness” seems incapable of even trying to answer. And that doesn’t even get into actual moral problems society actually faces every day (i.e. what is the most moral taxation scheme?).
If weak-FAI can’t solve those types of problems, or even suggest useful directions to look, why should we believe it is a step on the path to strong-FAI?
That’s my point. I’m not sure where the confusion is, here. Why would you call it useless to prevent wireheading, UFAI, and nuclear winter, just because it can’t also do your taxes?
If it’s easier to solve the big problems first, wouldn’t we want to do that? And then afterwards we can take our sweet time figuring out abortion and gay marriage and tax codes, because a failure there doesn’t end the species.
For reasons related to Hidden Complexity of Wishes, I don’t think weak-FAI actually is likely to prevent “wireheading, UFAI, and nuclear winter.” At best, it prohibits the most obvious implementations of those problems. And it is terribly unlikely to be helpful in creating strong-FAI.
And your original claim was that common human preferences already implement weak-FAI preferences. I think that the more likely reason why we haven;t had the disasters you reference is that for most of human history, we lacked the capacity to cause those problems. As actual society shows, hidden complexity of wishes make implementing social consensus hopeless, much less whatever smaller set of preferences is weak-FAI preferences.
As actual society shows, hidden complexity of wishes make implementing social consensus hopeless
My basic point was that we shouldn’t worry about politics, at least not yet, because politics is a wonderful example of all the hard questions in CEV, and we haven’t even worked out the easy questions like how to prevent nuclear winter. My second point was that humans do seem to have a much clearer CEV when it comes to “prevent nuclear winter”, even if it’s still not unanimous.
Implicit in that should have been the idea that CEV is still ridiculously difficult. Just like intelligence, it’s something humans seem to have and use despite being unable to program for it.
So, then, summarized, I’m saying that we should perhaps work out the easy problems first, before we go throwing ourselves against harder problems like politics.
There’s not a clear dividing line between “easy” moral questions and hard moral questions. The Cold War, which massively increased the risk of nuclear winter, was a rational expression of Great Power relations between two powers.
Until we have mutually acceptable ways of resolving disputes when both parties are rationally protecting their interests, we can’t actually solve the easy problems either.
I think the vast majority of the population would agree that genocide and mass murder are bad
Sure, genocide is bad. That’s why the Greens — who are corrupting our precious Blue bodily fluids to exterminate pure-blooded Blues, and stealing Blue jobs so that Blues will die in poverty — must all be killed!
Your edit pretty much captures my point, yes :) If nothing else, a Weak Friendly AI should eliminate a ton of the trivial distractions like war and famine, and I’d expect that humans have a much more unified volition when we’re not constantly worried about scarcity and violence. There’s not a lot of current political problems I’d see being relevant in a post-AI, post-scarcity, post-violence world.
The problem is that we have to guarantee that the AI doesn’t do something really bad while trying to stop these problems; what if it decides it really needs more resources suddenly, or needs to spy on everyone, even briefly? And it seems (to me at least) that stopping it from having bad side effects is pretty close, if not equivalent to, Strong Friendliness.
I should have made that more clear: I still think Weak-Friendliness is a very difficult problem. My point is simply that we only need an AI that solves the big problems, not an AI that can do our taxes. My second point was that humans seem to already implement weak-friendliness, barring a few historical exceptions, whereas so far we’ve completely failed at implementing strong-friendliness.
I’m using Weak vs Strong here in the sense of Weak being a “SysOP” style AI that just handles catastrophes, whereas Strong is the “ushers in the Singularity” sort that usually gets talked about here, and can do your taxes :)
Wait, if you’re regarding the elimination of war, famine and disease as consolation prizes for creating an wFAI, what are people expecting from a sFAI?
Really. That really is what people are expecting of a strong FAI. Compared with us, it will be omniscient, omnipotent, and omnibenevolent. Unlike currently believed-in Gods, there will be no problem of evil because it will remove all evil from the world. It will do what the Epicurean argument demands of any God worthy of the name.
Well, I don’t take seriously any of these speculations about God-like vs. merely angel-like creations. They’re just a distraction from the task of actually building them, which no-one knows how to do anyway.
Because we have no idea how hard it is to specify either. If, along the way it turns out to be easy to specify wFAI and risky to specify sFAI, then the reasonable course is expected. Doubly so since a wFAI would almost certainly be useful in helping specify a sFAI.
Seeing as human values are a miniscule target, it seems probable that specifying wFAI is harder than sFAI though.
Why would it be harder? One could tell the wFAI improve factors that are strongly correlated with human values, such as food stability, resources that cure preventable diseases (such as diarrhea, which, as we know, kills way more people than it should) and security from natural disasters.
“I wish for a list of possibilities for sequences of actions, any of whose execution would satisfy the following conditions.
Within twenty years, for Nigeria to have standards of living such that it would receive the same rating as Finland on [Placeholder UN Scale of People’s-Lives-Not-Being-Awful].”
The course of action would be evaluated by a think-tank, until they decided that the course of actions was acceptable, and the wFAI was given the go.
The AI optimizes only for that and doesn’t generate a list of non-obvious side effects. You implement one of them and something horrible happens to finland, and or countries beside nigeria.
or
In order to generate said list I simulate Nigeria millions of times to a resolution such that entities within the simulation pass the turing test. Most of the simulations involve horrible outcomes for all involved.
or
I generate such a list including many sequences of actions that lead to a small group being able to take over nigeria and or finland and or the world. (or generates some other power differential that screws up international relations)
or
In order to execute such an action I need more computing power, and you forgot to specify what are acceptable actions for obtaining it.
or
The wFAI is much cleverer than a single human thinking about this for 2 minutes and can screw things up in ways that are as opaque to you as human actions are to a dog.
Even more generally, our ability to build an AI that is friendly will have nothing to do with our ability to generate clauses in english that sound reasonable.
There’s a value, call it “weak friendliness”, that I view as a prerequisite to politics: it’s a function that humans already implement successfully, and is the one that says “I don’t want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips”.
A hands-off AI overlord can prevent all of that, while still letting humanity squabble over gay rights and which religion is correct.
And, well, the whole point of an AI is that it’s smarter than us, and thus has a chance of solving harder problems.
I’m not sure this is true in any useful sense. Louis XIV probably agrees with me that “I don’t want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips.”
But I think is is pretty clear than the Sun King was not implementing my moral preferences, and I am not implementing his. Either one of us is not “weak friendly” or “weak friendly” is barely powerful enough to answer really easy moral questions like “should I commit mass murder for no reason at all?” (Hint: no).
If weak friendly morality is really that weak, then I have no confidence that a weak-FAI would be able to make a strong-FAI, or even would want to. In other words, I suspect that what most people mean by weak friendly is highly generalized applause lights that widely diverging values could agree with without any actual agreement on which actions are more moral.
I think a lower bound on weak friendliness is whether or not entities living within the society consider their lives worthwhile. Of course this opens up debate about house elves and such but it’s a useful starting point.
That (along with this semi-recent exchange) reminds me of a stupid idea I had for a group decision process a while back.
Party A dislikes the status quo. To change it, they declare to the sysop that they would rather die than accept it.
The sysop accepts this and publicly announces a provisionally scheduled change.
Party B objects to the change and declares that they’d rather die than accept A’s change.
If neither party backs down, a coin is flipped and the “winner” is asked to kill the loser in order for their preference to be realized; face-to-face to make it as difficult as possible, thereby maximizing the chances of one party or the other backing down.
If the parties consist of multiple individuals, the estimated weakest-willed person on the majority side has to kill (or convince to forfeit) the weakest person on the minority side; then the next-weakest, until the minority side is eliminated. If they can’t or won’t, then they’re out of the fight, and replaced with the next-weakest person, et cetera until the minority is eliminated or the majority becomes the minority.
Basically, formalized war, only done in the opposite way of the strawman version in A Taste of Armageddon; making actual killing more difficult rather than easier.
A few reasons it’s stupid:
People will tolerate conditions much worse than death (for themselves, or for others unable to self-advocate) rather than violate the taboo against killing or against “threatening” suicide.
The system may make bad social organizations worse by removing the most socially enlightened and active people first.
People have values outside themselves, so they’ll stay alive and try to work for change rather than dying pointlessly and leaving things to presumably get worse and worse from their perspective.
Prompting people to kill or die for their values will galvanize them and make reconciliation less likely.
Real policy questions aren’t binary, and how a question is framed or what order questions are considered in will probably strongly affect the outcome and who lives or dies, which will further affect future outcomes.
A side might win after initially taking casualties, or even be vindicated a long time after their initial battle. They’d want their people back, but keeping backups of people killed in a battle would make “killing” them much easier psychologically. It might also put them at risk of being restored in a dystopia that no longer respects their right to die. (Of course, people might still be reconstructed from records and others’ memories even if they weren’t stored anywhere in their entirety.)
The system assumes that there’s a well-defined notion of an individual by which groups can be counted, and that individuals can’t be created at will to try to outnumber opponents (possibly relevant: 1, 2, 3, 4).
People will immediately reject the system, so the first thing anyone “votes” for will be to abolish it, regardless of how much worse the result might be.
If there’s an afterlife (i.e. simulation hypothesis), we might just be passing the buck.
I’m not sure it’s a good idea to even public(al)ly discuss things like this.
Actually, I think I’m now remembering a better (or better-sounding) idea that occurred to me later: rather than something as extreme as deletion, let people “vote” by agreeing to be deinstantiated, giving up the resources that would have been spent instantiating them. It might be essentially the same as death if they stayed that way til the end of the universe, but it wouldn’t be as ugly. Maybe they could be periodically awakened if someone wants to try to persuade them to change or withdraw their vote.
That would hopefully keep people from voting selfishly or without thorough consideration. On the other hand, it might insulate them from the consequences of poor policies.
Also, how to count votes is still a problem; where would “the resources that would have been spent instantiating them” come from? Is this a socialist world where everyone is entitled to a certain income, and if so, what happens when population outstrips resources? Or, in a laissez-faire world where people can run out of money and be deinstantiated, the idea amounts to plain old selling of votes to the rich
, like we have now.Basically, both my ideas seem to require a eutopia already in place, or at least a genuine 100% monopoly on force. I think that might be my point. Or maybe it’s that a simple-sounding, socially acceptable idea like “If someone would rather die than tolerate the status quo, that’s bad, and the status quo should be changed” isn’t socially acceptable once you actually go into details and/or strip away the human assumptions.
Can this be set up in a round robin fashion with sets of mutually exclusive values such that everyone who is willing to kill for their values kills each other?
Maybe if the winning side’s values mandated their own deaths. But then it would be pointless for the sysop to respond to their threat of suicide to begin with, so I don’t know. I’m not sure if there’s something you’re getting at that I’m not seeing.
“I’m not going to live there. There’s no place for me there… any more than there is for you. Malcolm… I’m a monster.What I do is evil. I have no illusions about it, but it must be done. ”
The Operative, from Serenity. (On the off-chance that somebody isn’t familiar with that quote.)
I’m thinking if you do the matchup’s correctly you only wind up with one such person at the end, whom all the others secretly precommit to killing.
...maybe this shouldn’t be discussed publicly.
I don’t think the system works in the first place without a monopoly on lethal force. You could work within the system by “voting” for his death, but then his friends (if any) get a chance to join in the vote, and their friends, til you pretty much have a new war going. (That’s another flaw in the system I could have mentioned.)
I think the vast majority of the population would agree that genocide and mass murder are bad, same as wire heading and turning the earth in to paperclips. A single exception isn’t terribly noteworthy—I’m sure there’s at least a few pro-wire-heading people out there, and I’m sure at least a few people have gotten enraged enough at humanity to think paperclips would be a better use of the space.
If you have a reason to suspect that “mass murder” is a common preference, that’s another matter.
Mass murder is an easy question.
Is the Sun King (who doesn’t particularly desire pointless mass murder) more moral than I am? Much harder, and your articulation of “weak Friendliness” seems incapable of even trying to answer. And that doesn’t even get into actual moral problems society actually faces every day (i.e. what is the most moral taxation scheme?).
If weak-FAI can’t solve those types of problems, or even suggest useful directions to look, why should we believe it is a step on the path to strong-FAI?
That’s my point. I’m not sure where the confusion is, here. Why would you call it useless to prevent wireheading, UFAI, and nuclear winter, just because it can’t also do your taxes?
If it’s easier to solve the big problems first, wouldn’t we want to do that? And then afterwards we can take our sweet time figuring out abortion and gay marriage and tax codes, because a failure there doesn’t end the species.
For reasons related to Hidden Complexity of Wishes, I don’t think weak-FAI actually is likely to prevent “wireheading, UFAI, and nuclear winter.” At best, it prohibits the most obvious implementations of those problems. And it is terribly unlikely to be helpful in creating strong-FAI.
And your original claim was that common human preferences already implement weak-FAI preferences. I think that the more likely reason why we haven;t had the disasters you reference is that for most of human history, we lacked the capacity to cause those problems. As actual society shows, hidden complexity of wishes make implementing social consensus hopeless, much less whatever smaller set of preferences is weak-FAI preferences.
My basic point was that we shouldn’t worry about politics, at least not yet, because politics is a wonderful example of all the hard questions in CEV, and we haven’t even worked out the easy questions like how to prevent nuclear winter. My second point was that humans do seem to have a much clearer CEV when it comes to “prevent nuclear winter”, even if it’s still not unanimous.
Implicit in that should have been the idea that CEV is still ridiculously difficult. Just like intelligence, it’s something humans seem to have and use despite being unable to program for it.
So, then, summarized, I’m saying that we should perhaps work out the easy problems first, before we go throwing ourselves against harder problems like politics.
There’s not a clear dividing line between “easy” moral questions and hard moral questions. The Cold War, which massively increased the risk of nuclear winter, was a rational expression of Great Power relations between two powers.
Until we have mutually acceptable ways of resolving disputes when both parties are rationally protecting their interests, we can’t actually solve the easy problems either.
from you:
and from me:
So, um, we agree, huzzah? :)
Sure, genocide is bad. That’s why the Greens — who are corrupting our precious Blue bodily fluids to exterminate pure-blooded Blues, and stealing Blue jobs so that Blues will die in poverty — must all be killed!
We usually call that the ‘sysop AI’ proposal, I think.
There’s a bootstrapping problem inherent to handing AI the friendliness problem to solve.
Edit: Unless you’re suggesting we use a Weakly Friendly AI to solve the hard problem of Strong Friendliness?
Your edit pretty much captures my point, yes :) If nothing else, a Weak Friendly AI should eliminate a ton of the trivial distractions like war and famine, and I’d expect that humans have a much more unified volition when we’re not constantly worried about scarcity and violence. There’s not a lot of current political problems I’d see being relevant in a post-AI, post-scarcity, post-violence world.
The problem is that we have to guarantee that the AI doesn’t do something really bad while trying to stop these problems; what if it decides it really needs more resources suddenly, or needs to spy on everyone, even briefly? And it seems (to me at least) that stopping it from having bad side effects is pretty close, if not equivalent to, Strong Friendliness.
I should have made that more clear: I still think Weak-Friendliness is a very difficult problem. My point is simply that we only need an AI that solves the big problems, not an AI that can do our taxes. My second point was that humans seem to already implement weak-friendliness, barring a few historical exceptions, whereas so far we’ve completely failed at implementing strong-friendliness.
I’m using Weak vs Strong here in the sense of Weak being a “SysOP” style AI that just handles catastrophes, whereas Strong is the “ushers in the Singularity” sort that usually gets talked about here, and can do your taxes :)
This… may be an amazing idea. I’m noodling on it.
Edit: Completely misread the parent.
I know this wasn’t the spirit of your post, but I wouldn’t refer to war and famine as “trivial distractions”.
Wait, if you’re regarding the elimination of war, famine and disease as consolation prizes for creating an wFAI, what are people expecting from a sFAI?
God. Either with or without the ability to bend the currently known laws of physics.
No, really.
Really. That really is what people are expecting of a strong FAI. Compared with us, it will be omniscient, omnipotent, and omnibenevolent. Unlike currently believed-in Gods, there will be no problem of evil because it will remove all evil from the world. It will do what the Epicurean argument demands of any God worthy of the name.
Are you telling me that if a wFAI were capable of eliminating war, famine and disease, it wouldn’t be developed first?
Well, I don’t take seriously any of these speculations about God-like vs. merely angel-like creations. They’re just a distraction from the task of actually building them, which no-one knows how to do anyway.
But still, if a wFAI was capable of eliminating those things, why be picky and try for sFAI?
Because we have no idea how hard it is to specify either. If, along the way it turns out to be easy to specify wFAI and risky to specify sFAI, then the reasonable course is expected. Doubly so since a wFAI would almost certainly be useful in helping specify a sFAI.
Seeing as human values are a miniscule target, it seems probable that specifying wFAI is harder than sFAI though.
“Specify”? What do you mean?
specifications a la programming.
Why would it be harder? One could tell the wFAI improve factors that are strongly correlated with human values, such as food stability, resources that cure preventable diseases (such as diarrhea, which, as we know, kills way more people than it should) and security from natural disasters.
Because if you screw up specifying human values you don’t get wFAI you just die (hopefully).
It’s not optimizing human values, it’s optimizing circumstances that are strongly correlated with human values. It would be a logistics kind of thing.
Have you ever played corrupt a wish?
No, but I’m guessing I’m about to.
“I wish for a list of possibilities for sequences of actions, any of whose execution would satisfy the following conditions.
Within twenty years, for Nigeria to have standards of living such that it would receive the same rating as Finland on [Placeholder UN Scale of People’s-Lives-Not-Being-Awful].”
The course of action would be evaluated by a think-tank, until they decided that the course of actions was acceptable, and the wFAI was given the go.
The AI optimizes only for that and doesn’t generate a list of non-obvious side effects. You implement one of them and something horrible happens to finland, and or countries beside nigeria.
or
In order to generate said list I simulate Nigeria millions of times to a resolution such that entities within the simulation pass the turing test. Most of the simulations involve horrible outcomes for all involved.
or
I generate such a list including many sequences of actions that lead to a small group being able to take over nigeria and or finland and or the world. (or generates some other power differential that screws up international relations)
or
In order to execute such an action I need more computing power, and you forgot to specify what are acceptable actions for obtaining it.
or
The wFAI is much cleverer than a single human thinking about this for 2 minutes and can screw things up in ways that are as opaque to you as human actions are to a dog.
In general, specifying an oracle/tool AI is not safe: http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/
Even more generally, our ability to build an AI that is friendly will have nothing to do with our ability to generate clauses in english that sound reasonable.