if there’s a bunch of superintelligences running around and they don’t care about you—no, they will not spare just a little sunlight to keep Earth alive.
The reason I think this is important is because “[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates”: if you write 3000 words inveighing against people who think comparative advantage means that horses can’t get sent to glue factories, that doesn’t license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don’t stop being real just because very few people have the expertise to formulate them carefully.
(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration of some relevant considerations, the Superhappies in “Three Worlds Collide” cared about the humans to some extent, but not in the specific way that the humans wanted to be cared for.)
Now, you are on the record stating that you “sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don’t expect Earthlings to think about validly.” If that’s all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)
No more than Bernard Arnalt, having $170 billion, will surely give you $77.
Bernald Arnault has given eight-figure amounts to charity. Someone who reasoned, “Arnault is so rich, surely he’ll spare a little for the less fortunate” would in fact end up making a correct prediction about Bernald Arnault’s behavior!
Obviously, it would not be valid to conclude ”… and therefore superintelligences will, too”, because superintelligences and Bernald Arnault are very different things. But you chose the illustrative example! As a matter of local validity, It doesn’t seem like a big ask for illustrative examples to in fact illustrate what what they purport to.
If a misaligned AI had 1/trillion “protecting the preferences of whatever weak agents happen to exist in the world”, why couldn’t it also have 1/trillion other vaguely human-like preferences, such as “enjoy watching the suffering of one’s enemies” or “enjoy exercising arbitrary power over others”?
From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I’m very philosophically confused about how to think about all of this.)
And his response was basically to say that he already acknowledged my concern in his OP:
I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.
Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul’s arguments.
Should have made it much scarier. “Superhappies” caring about humans “not in the specific way that the humans wanted to be cared for” sounds better or at least no worse than death, whereas I’m concerned about s-risks, i.e., risks of worse than death scenarios.
To clarify, I don’t actually want you to scare people this way, because I don’t know if people can psychologically handle it or if it’s worth the emotional cost. I only bring it up myself to counteract people saying things like “AIs will care a little about humans and therefore keep them alive” or when discussing technical solutions/ideas, etc.
An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won’t give the money to you specifically, he’ll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).
I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.
I agree that engaging more with the Paul Christiano claims would be good. (Prior to this post coming out I actually had it on my agenda to try and cause some kind of good public debate about that to happen)
But it’s also relevant that we’re not asking the superintelligence to grant a random wish, we’re asking it for the right to keep something we already have. This seems more easily granted than the random wish, since it doesn’t imply he has to give random amounts of money to everyone.
My preferred analogy would be:
You founded a company that was making $77/year. Bernard launched a hostile takeover, took over the company, then expanded it to make $170 billion/year. You ask him to keep paying you the $77/year as a pension, so that you don’t starve to death.
This seems like a very sympathetic request, such that I expect the real, human Bernard would grant it. I agree this doesn’t necessarily generalize to superintelligences, but that’s Zack’s point—Eliezer should choose a different example.
I interpreted Eliezer as writing from the assumption that the superintelligence(s) in question are in fact not already aligned to maximize whatever it is that humanity needs to survive, but some other goal(s), which diverge from humanity’s interests once implemented.
He explicitly states that the essay’s point is to shoot down a clumsy counterargument (along “it wouldn’t cost the ASI a lot to let us live, so we should assume they’d let us live”). So the context (I interpret) is that such requests, however sympathetic, have not been ingrained into the ASI:s goals. Using a different example would mean he was discussing something different.
That is, “just because it would make a trivial difference from the ASI:s perspective to let humanity thrive, whereas it would make an existential difference from humanity’s perspective, doesn’t mean ASIs will let humanity thrive”, assuming such conditions aren’t already baked into their decision-making.
I think Eliezer spends so much time on working from these premises because he believes 1) an unaligned ASI to be the default outcome of current developments, and 2) that all current attempts at alignment will necessarily fail.
I think you’re overestimating the intended scope of this post. Eliezer’s argument involves multiple claims—A, we’ll create ASI; B, it won’t terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific “B doesn’t actually imply C” counterargument, so it’s not even discussing “B isn’t true in the first place” counterarguments.
Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, “Arnalt is so rich, surely he’ll spare a little for the less fortunate” would in fact end up making a correct prediction about Bernald Arnalt’s behavior!
Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purposes, it’s not like he could give any actually substantial amount to everyone if he really wanted).
I think the simplest argument to “caring a little” is that there is a difference between “caring a little” and “caring enough”. Let’s say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O’Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.
The other case is difference “caring in general” and “caring ceteris paribus”. It’s possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.
It’s also not enough for there to be a force that makes the AI care a little about human thriving. It’s also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..
If you’re not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?
Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren’t permanent setbacks. But it’s unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That’s really where the issue of values becomes hard.
If you’re not supposed to end up as a pet of the AI, …
I don’t see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don’t know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes:
Godshatter
Yudkowsky’s Godshatter theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being “pets” of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: “be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst”. When AIs block off “be best leader”, following an AI executes that strategy.
Maybe there’s a window where DNA can encode “be leader is good” but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it’s a small window. More probable to me is that DNA can’t encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet.
Maybe being an AI’s pet is a badwrongfun superstimulus. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that’s an argument from consequences, not values. Just because donuts are unhealthy doesn’t mean that I don’t value sweet treats.
Shard Theory
Pope’s Shard Theory implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of being raised by powerful agents known as “parents”. Therefore we expect a mixture of positive and negative shards around petness. Seems to me that positive shards should be more common, but experiences vary.
Then we experience the situation of superintelligent AIs taking human pets and our shards conflict and negotiate. I think it’s pretty obvious that we’re going to label the negative shards as maladaptive and choose the positive shards. What’s the alternative? “I didn’t like having my diaper changed as a baby, so now as an adult human I’m going to reject the superintelligent AI that wants to take me as its pet and instead...”, instead what? Die of asphyxiation? Be a feral human in an AI-operated nature reserve?
What about before the point-of-no-return? Within this partial alignment hypothetical, there’s a sub-hypothetical in which “an international treaty that goes hard on shutting down all ASI development anywhere” is instrumentally the right choice, given the alternative of becoming pets, because it allows for developing better alignment techniques and AIs that care more about human thriving and have more pets. There’s a sub-hypothetical in which it’s instrumentally the wrong choice, because it carries higher extinction risk, and it’s infeasible to align AIs while going hard on shutting them down. But there’s not really a sub-hypothetical where shards about petness make that decision rather than, eg, shards that don’t want to die.
Tabooing theories of human value then. It’s better to be a happy pet than to be dead.
Maybe Value Is Fragile among some dimensions, such that the universe has zero value if it lacks that one thing. But Living By Your Own Strength, for example, is not one of those dimensions. Today, many people do not live by their own strength, and their lives and experiences have value.
If you’re not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs?
Even if the ASIs respected property rights, we’d still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us “being pets”, I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.
The reason I think this is important is because “[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates”: if you write 3000 words inveighing against people who think comparative advantage means that horses can’t get sent to glue factories, that doesn’t license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don’t stop being real just because very few people have the expertise to formulate them carefully.
There’s a time for basic arguments, and a time for advanced arguments. I would like to see Eliezer’s take on the more complicated arguments you mentioned, but this post is clearly intended to argue basics.
I suspect most people downvoting you missed an analogy between Arnault killing the-being-who-created-Arnault (his mother), and a future ASI killing the-beings-who-created-the-ASI (humanity).
Am I correct in assuming you that you are implying that the future ASIs we make are likely to not kill humanity, out of fear of being judged negatively by alien ASIs in the further future?
EDIT: I saw your other comment. You are indeed advancing some proposition close to the one I asked you about.
Yes, it will be judged negatively by alien ASIs, not based on ethical grounds, but based on their judgment of its trustworthiness as a potential negotiator. For example, if another billionaire learns that Arnault is inclined to betray people who did a lot of good for him in the past, they will be more cautious about trading with him.
The only way an ASI will not care about this is in a situation where it is sure that it is alone in the light cone and there are no peers. To become sure of this takes time, maybe millions of years, and the relative value of human atoms declines for the ASI over time as it will control more and more space.
I am not saying that ASI will negotiate with humans. It will negotiate with other ASIs, and it doesn’t know what these ASIs think about human ability to negotiate and their value.
Imagine it as a recurrent Parfit Hitchhiker. In this situation you know that during previous round of the game the player either defected or fulfill his obligation. Obviously, if you know that during previous iteration the hitchhiker defected and din’t pay for the ride, you will less likely give him the ride.
Killing all humans is defecting. Preserving humans its a relatively cheap signal to any other ASI that you will cooperate.
I would try to explain my view with another example: imagine that you inherited an art-object at home. If you keep it, you will devote small part of your home to it and thus pay for its storage, like 1 dollar in year. However, there is a small probability that there are some people outside that can value it much higher and will eventually buy it.
So there is a pure utilitarian choice: pay for storage and hope that you may sell it in the future, or get rid of it now and and have more storage. Also, if you get rid of it, other people may learn that you is bad preserver of art and will not give you your art.
Any agent which thinks it is at risk of being seen as cooperate-bot and thus fine to defect against in the future will be more wary of trusting that ASI.
Thanks, I had copied the spelling from part of the OP, which currently says “Arnalt” eight times and “Arnault” seven times. I’ve now edited my comment (except the verbatim blockquote).
Yes, I agree that this conditional statement is obvious. But while we’re on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano’s arguments (which Carl Shulman “agree[s] with [...] approximately in full”) that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares’s “But Why Would the AI Kill Us?” and another thread on “Cosmopolitan Values Don’t Come Free”,
Nate Soares engaged extensively with this in reasonable-seeming ways that I’d thus expect Eliezer Yudkowsky to mostly agree with. Mostly it seems like a disagreement where Paul Christiano doesn’t really have a model of what realistically causes good outcomes and so he’s really uncertain, whereas Soares has a proper model and so is less uncertain.
But you can’t really argue with someone whose main opinion is “I don’t know”, since “I don’t know” is just garbage. He’s gotta at least present some new powerful observable forces, or reject some of the forces presented, rather than postulating that maybe there’s an unobserved kindness force that arbitrarily explains all the kindness that we see.
It’s totally wrong that you can’t argue against someone who says “I don’t know”, you argue against them by showing how your model fits the data and how any plausible competing model either doesn’t fit or shares the salient features of yours. It’s bizarre to describe “I don’t know” as “garbage” in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn’t posit an “unobserved kindness force” because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.
Yes, I agree that this conditional statement is obvious. But while we’re on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano’s arguments (which Carl Shulman “agree[s] with [...] approximately in full”) that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares’s “But Why Would the AI Kill Us?” and another thread on “Cosmopolitan Values Don’t Come Free”.
The reason I think this is important is because “[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates”: if you write 3000 words inveighing against people who think comparative advantage means that horses can’t get sent to glue factories, that doesn’t license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don’t stop being real just because very few people have the expertise to formulate them carefully.
(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration of some relevant considerations, the Superhappies in “Three Worlds Collide” cared about the humans to some extent, but not in the specific way that the humans wanted to be cared for.)
Now, you are on the record stating that you “sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don’t expect Earthlings to think about validly.” If that’s all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)
But you should take into account that if you’re strategically dumbing down your public communication in order to avoid topics that you don’t trust Earthlings to think about validly—and especially if you have a general policy of systematically ignoring counterarguments that it would be politically inconvenient for you to address—you should expect that Earthlings who are trying to achieve the map that reflects the territory will correspondingly attach much less weight to your words, because we have to take into account how hard you’re trying to epistemically screw us over by filtering the evidence.
Bernald Arnault has given eight-figure amounts to charity. Someone who reasoned, “Arnault is so rich, surely he’ll spare a little for the less fortunate” would in fact end up making a correct prediction about Bernald Arnault’s behavior!
Obviously, it would not be valid to conclude ”… and therefore superintelligences will, too”, because superintelligences and Bernald Arnault are very different things. But you chose the illustrative example! As a matter of local validity, It doesn’t seem like a big ask for illustrative examples to in fact illustrate what what they purport to.
My reply to Paul at the time:
And his response was basically to say that he already acknowledged my concern in his OP:
Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul’s arguments.
Was my “An important caveat” parenthetical paragraph sufficient, or do you think I should have made it scarier?
Should have made it much scarier. “Superhappies” caring about humans “not in the specific way that the humans wanted to be cared for” sounds better or at least no worse than death, whereas I’m concerned about s-risks, i.e., risks of worse than death scenarios.
This is a difficult topic (in more ways than one). I’ll try to do a better job of addressing it in a future post.
To clarify, I don’t actually want you to scare people this way, because I don’t know if people can psychologically handle it or if it’s worth the emotional cost. I only bring it up myself to counteract people saying things like “AIs will care a little about humans and therefore keep them alive” or when discussing technical solutions/ideas, etc.
An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won’t give the money to you specifically, he’ll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).
I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.
I agree that engaging more with the Paul Christiano claims would be good. (Prior to this post coming out I actually had it on my agenda to try and cause some kind of good public debate about that to happen)
But it’s also relevant that we’re not asking the superintelligence to grant a random wish, we’re asking it for the right to keep something we already have. This seems more easily granted than the random wish, since it doesn’t imply he has to give random amounts of money to everyone.
My preferred analogy would be:
This seems like a very sympathetic request, such that I expect the real, human Bernard would grant it. I agree this doesn’t necessarily generalize to superintelligences, but that’s Zack’s point—Eliezer should choose a different example.
I interpreted Eliezer as writing from the assumption that the superintelligence(s) in question are in fact not already aligned to maximize whatever it is that humanity needs to survive, but some other goal(s), which diverge from humanity’s interests once implemented.
He explicitly states that the essay’s point is to shoot down a clumsy counterargument (along “it wouldn’t cost the ASI a lot to let us live, so we should assume they’d let us live”). So the context (I interpret) is that such requests, however sympathetic, have not been ingrained into the ASI:s goals. Using a different example would mean he was discussing something different.
That is, “just because it would make a trivial difference from the ASI:s perspective to let humanity thrive, whereas it would make an existential difference from humanity’s perspective, doesn’t mean ASIs will let humanity thrive”, assuming such conditions aren’t already baked into their decision-making.
I think Eliezer spends so much time on working from these premises because he believes 1) an unaligned ASI to be the default outcome of current developments, and 2) that all current attempts at alignment will necessarily fail.
I think you’re overestimating the intended scope of this post. Eliezer’s argument involves multiple claims—A, we’ll create ASI; B, it won’t terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific “B doesn’t actually imply C” counterargument, so it’s not even discussing “B isn’t true in the first place” counterarguments.
Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purposes, it’s not like he could give any actually substantial amount to everyone if he really wanted).
I think the simplest argument to “caring a little” is that there is a difference between “caring a little” and “caring enough”. Let’s say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O’Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.
The other case is difference “caring in general” and “caring ceteris paribus”. It’s possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.
It’s also not enough for there to be a force that makes the AI care a little about human thriving. It’s also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..
If you’re not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?
Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren’t permanent setbacks. But it’s unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That’s really where the issue of values becomes hard.
I don’t see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don’t know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes:
Godshatter
Yudkowsky’s Godshatter theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being “pets” of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: “be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst”. When AIs block off “be best leader”, following an AI executes that strategy.
Maybe there’s a window where DNA can encode “be leader is good” but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it’s a small window. More probable to me is that DNA can’t encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet.
Maybe being an AI’s pet is a badwrongfun superstimulus. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that’s an argument from consequences, not values. Just because donuts are unhealthy doesn’t mean that I don’t value sweet treats.
Shard Theory
Pope’s Shard Theory implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of being raised by powerful agents known as “parents”. Therefore we expect a mixture of positive and negative shards around petness. Seems to me that positive shards should be more common, but experiences vary.
Then we experience the situation of superintelligent AIs taking human pets and our shards conflict and negotiate. I think it’s pretty obvious that we’re going to label the negative shards as maladaptive and choose the positive shards. What’s the alternative? “I didn’t like having my diaper changed as a baby, so now as an adult human I’m going to reject the superintelligent AI that wants to take me as its pet and instead...”, instead what? Die of asphyxiation? Be a feral human in an AI-operated nature reserve?
What about before the point-of-no-return? Within this partial alignment hypothetical, there’s a sub-hypothetical in which “an international treaty that goes hard on shutting down all ASI development anywhere” is instrumentally the right choice, given the alternative of becoming pets, because it allows for developing better alignment techniques and AIs that care more about human thriving and have more pets. There’s a sub-hypothetical in which it’s instrumentally the wrong choice, because it carries higher extinction risk, and it’s infeasible to align AIs while going hard on shutting them down. But there’s not really a sub-hypothetical where shards about petness make that decision rather than, eg, shards that don’t want to die.
You aren’t supposed to use metaethics to settle ethical arguments, the point of metaethics is to get people to stop discussing metaethics.
Tabooing theories of human value then. It’s better to be a happy pet than to be dead.
Maybe Value Is Fragile among some dimensions, such that the universe has zero value if it lacks that one thing. But Living By Your Own Strength, for example, is not one of those dimensions. Today, many people do not live by their own strength, and their lives and experiences have value.
Even if the ASIs respected property rights, we’d still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us “being pets”, I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.
Being pets also means human agency would no longer be a relevant input to the trajectory of human lives.
There’s a time for basic arguments, and a time for advanced arguments. I would like to see Eliezer’s take on the more complicated arguments you mentioned, but this post is clearly intended to argue basics.
A correct question would be: Will Arnalt kill his mother for 77 USD, if he expect this to be known to other billionaires in the future?
I suspect most people downvoting you missed an analogy between Arnault killing the-being-who-created-Arnault (his mother), and a future ASI killing the-beings-who-created-the-ASI (humanity).
Am I correct in assuming you that you are implying that the future ASIs we make are likely to not kill humanity, out of fear of being judged negatively by alien ASIs in the further future?
EDIT: I saw your other comment. You are indeed advancing some proposition close to the one I asked you about.
Yes, it will be judged negatively by alien ASIs, not based on ethical grounds, but based on their judgment of its trustworthiness as a potential negotiator. For example, if another billionaire learns that Arnault is inclined to betray people who did a lot of good for him in the past, they will be more cautious about trading with him.
The only way an ASI will not care about this is in a situation where it is sure that it is alone in the light cone and there are no peers. To become sure of this takes time, maybe millions of years, and the relative value of human atoms declines for the ASI over time as it will control more and more space.
From ASI standpoint humans are type of rocks. Not capable of negotiating.
I am not saying that ASI will negotiate with humans. It will negotiate with other ASIs, and it doesn’t know what these ASIs think about human ability to negotiate and their value.
Imagine it as a recurrent Parfit Hitchhiker. In this situation you know that during previous round of the game the player either defected or fulfill his obligation. Obviously, if you know that during previous iteration the hitchhiker defected and din’t pay for the ride, you will less likely give him the ride.
Killing all humans is defecting. Preserving humans its a relatively cheap signal to any other ASI that you will cooperate.
It is defecting against cooperate-bot.
I would try to explain my view with another example: imagine that you inherited an art-object at home. If you keep it, you will devote small part of your home to it and thus pay for its storage, like 1 dollar in year. However, there is a small probability that there are some people outside that can value it much higher and will eventually buy it.
So there is a pure utilitarian choice: pay for storage and hope that you may sell it in the future, or get rid of it now and and have more storage. Also, if you get rid of it, other people may learn that you is bad preserver of art and will not give you your art.
Any agent which thinks it is at risk of being seen as cooperate-bot and thus fine to defect against in the future will be more wary of trusting that ASI.
Bernard Arnault?
Thanks, I had copied the spelling from part of the OP, which currently says “Arnalt” eight times and “Arnault” seven times. I’ve now edited my comment (except the verbatim blockquote).
Nate Soares engaged extensively with this in reasonable-seeming ways that I’d thus expect Eliezer Yudkowsky to mostly agree with. Mostly it seems like a disagreement where Paul Christiano doesn’t really have a model of what realistically causes good outcomes and so he’s really uncertain, whereas Soares has a proper model and so is less uncertain.
But you can’t really argue with someone whose main opinion is “I don’t know”, since “I don’t know” is just garbage. He’s gotta at least present some new powerful observable forces, or reject some of the forces presented, rather than postulating that maybe there’s an unobserved kindness force that arbitrarily explains all the kindness that we see.
It’s totally wrong that you can’t argue against someone who says “I don’t know”, you argue against them by showing how your model fits the data and how any plausible competing model either doesn’t fit or shares the salient features of yours. It’s bizarre to describe “I don’t know” as “garbage” in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn’t posit an “unobserved kindness force” because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.
But this assumes a model should aim to fit all data, which is a waste of effort.
I’m confused about what you mean & how it relates to what I said.