I think the simplest argument to “caring a little” is that there is a difference between “caring a little” and “caring enough”. Let’s say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O’Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.
The other case is difference “caring in general” and “caring ceteris paribus”. It’s possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.
It’s also not enough for there to be a force that makes the AI care a little about human thriving. It’s also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..
If you’re not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?
Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren’t permanent setbacks. But it’s unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That’s really where the issue of values becomes hard.
If you’re not supposed to end up as a pet of the AI, …
I don’t see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don’t know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes:
Godshatter
Yudkowsky’s Godshatter theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being “pets” of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: “be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst”. When AIs block off “be best leader”, following an AI executes that strategy.
Maybe there’s a window where DNA can encode “be leader is good” but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it’s a small window. More probable to me is that DNA can’t encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet.
Maybe being an AI’s pet is a badwrongfun superstimulus. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that’s an argument from consequences, not values. Just because donuts are unhealthy doesn’t mean that I don’t value sweet treats.
Shard Theory
Pope’s Shard Theory implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of being raised by powerful agents known as “parents”. Therefore we expect a mixture of positive and negative shards around petness. Seems to me that positive shards should be more common, but experiences vary.
Then we experience the situation of superintelligent AIs taking human pets and our shards conflict and negotiate. I think it’s pretty obvious that we’re going to label the negative shards as maladaptive and choose the positive shards. What’s the alternative? “I didn’t like having my diaper changed as a baby, so now as an adult human I’m going to reject the superintelligent AI that wants to take me as its pet and instead...”, instead what? Die of asphyxiation? Be a feral human in an AI-operated nature reserve?
What about before the point-of-no-return? Within this partial alignment hypothetical, there’s a sub-hypothetical in which “an international treaty that goes hard on shutting down all ASI development anywhere” is instrumentally the right choice, given the alternative of becoming pets, because it allows for developing better alignment techniques and AIs that care more about human thriving and have more pets. There’s a sub-hypothetical in which it’s instrumentally the wrong choice, because it carries higher extinction risk, and it’s infeasible to align AIs while going hard on shutting them down. But there’s not really a sub-hypothetical where shards about petness make that decision rather than, eg, shards that don’t want to die.
Tabooing theories of human value then. It’s better to be a happy pet than to be dead.
Maybe Value Is Fragile among some dimensions, such that the universe has zero value if it lacks that one thing. But Living By Your Own Strength, for example, is not one of those dimensions. Today, many people do not live by their own strength, and their lives and experiences have value.
If you’re not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs?
Even if the ASIs respected property rights, we’d still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us “being pets”, I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.
I think the simplest argument to “caring a little” is that there is a difference between “caring a little” and “caring enough”. Let’s say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O’Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.
The other case is difference “caring in general” and “caring ceteris paribus”. It’s possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.
It’s also not enough for there to be a force that makes the AI care a little about human thriving. It’s also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..
If you’re not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?
Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren’t permanent setbacks. But it’s unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That’s really where the issue of values becomes hard.
I don’t see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don’t know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes:
Godshatter
Yudkowsky’s Godshatter theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being “pets” of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: “be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst”. When AIs block off “be best leader”, following an AI executes that strategy.
Maybe there’s a window where DNA can encode “be leader is good” but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it’s a small window. More probable to me is that DNA can’t encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet.
Maybe being an AI’s pet is a badwrongfun superstimulus. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that’s an argument from consequences, not values. Just because donuts are unhealthy doesn’t mean that I don’t value sweet treats.
Shard Theory
Pope’s Shard Theory implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of being raised by powerful agents known as “parents”. Therefore we expect a mixture of positive and negative shards around petness. Seems to me that positive shards should be more common, but experiences vary.
Then we experience the situation of superintelligent AIs taking human pets and our shards conflict and negotiate. I think it’s pretty obvious that we’re going to label the negative shards as maladaptive and choose the positive shards. What’s the alternative? “I didn’t like having my diaper changed as a baby, so now as an adult human I’m going to reject the superintelligent AI that wants to take me as its pet and instead...”, instead what? Die of asphyxiation? Be a feral human in an AI-operated nature reserve?
What about before the point-of-no-return? Within this partial alignment hypothetical, there’s a sub-hypothetical in which “an international treaty that goes hard on shutting down all ASI development anywhere” is instrumentally the right choice, given the alternative of becoming pets, because it allows for developing better alignment techniques and AIs that care more about human thriving and have more pets. There’s a sub-hypothetical in which it’s instrumentally the wrong choice, because it carries higher extinction risk, and it’s infeasible to align AIs while going hard on shutting them down. But there’s not really a sub-hypothetical where shards about petness make that decision rather than, eg, shards that don’t want to die.
You aren’t supposed to use metaethics to settle ethical arguments, the point of metaethics is to get people to stop discussing metaethics.
Tabooing theories of human value then. It’s better to be a happy pet than to be dead.
Maybe Value Is Fragile among some dimensions, such that the universe has zero value if it lacks that one thing. But Living By Your Own Strength, for example, is not one of those dimensions. Today, many people do not live by their own strength, and their lives and experiences have value.
Even if the ASIs respected property rights, we’d still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us “being pets”, I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.
Being pets also means human agency would no longer be a relevant input to the trajectory of human lives.