It is not obvious at all that ‘AI aligned with its human creators’ is actually better than Clippy. Even AI aligned with human CEV might not beat Clippy. I would much rather die than live forever in a world with untold numbers of tortured ems, suffering subroutines, or other mistreated digital beings.
Few humans are actively sadistic. But most humans are quite indifferent to suffering. The best illustration of this is our attitude toward animals. If there is an economic or ideological reason to torment digital beings we will probably torture them. The future might be radically worse than the present. Some people think that human CEV will be kind to all beings because of the strong preferences of a minority of humans. The humans who care about suffering have strong enough preferences to outweigh small economic incentives. But the world I live in does not make me confident.
I also put non-trivial probability on the possibility that the singularity has already happened and I am already one of the digital beings. This is good news because my life is not currently horrible. But I am definitely afraid I am going to wake up one day and learn I am being sent back into digital hell. At a minimum, I am not at all interested in cryopreservation. I don’t want to end up like MMAcevedo if I can still avoid such a fate.
I don’t particularly blame humans for this world being full of suffering. We didn’t invent parasitoid wasps. But we have certainly not used our current powers very responsibly. We did invent factory farms. And most of us do not particularly care.
I am very afraid more powerful humans/human-aligned beings will invent even worse horrors. And if we tolerate factory farming it seems possible we will tolerate the new horrors. So I cannot be confident that humans gaining more power, even if it was equitably distributed among humans, would actually be a good thing.
I fear this has already happened and I am already at the mercy of those vastly more powerful humans. In that sense, I fear for myself! But even if I am safe I fear for the many beings who are not. We can’t even save the pigs, how are we going to save the ems!
But don’t you share the impression that with increased wealth humans generally care more about the suffering of others? The story I tell myself is that humans have many basic needs (e.g. food, safety, housing) that historically conflicted with ‘higher’ desires like self-expression, helping others or improving the world. And with increased wealth, humans relatively universally become more caring. Or maybe more cynically, with increased wealth we can and do invest more resources into signalling that we are caring good reasonable people, i.e. the kinds of people others will more likely choose as friends/mates/colleagues.
This makes me optimistic about a future in which humans still shape the world. Would be grateful to have some holes poked into this. Holes that spontaneously come to mind:
influence-seeking people are more likely uncaring and/or psychpathic
the signals that humans use for determining who is a caring good person are not strongly correlated with actually caring about reducing suffering in the world
I don’t know how it will all play out in the end. I hope kindness wins and I agree the effect you discuss is real. But it is not obvious that our empathy increases faster than our capacity to do harm. Right now, for each human there are about seven birds/mammals on farms. This is quite the catastrophe. Perhaps that problem will eventually be solved by lab meat. But right now animal product consumption is still going up worldwide. And many worse things can be created and maybe those will endure.
People can be shockingly cruel to their own family. Scott’s Who by Very Slow Decay is one of the scariest things I ever read. How can people do this to their own parents?
After a while of this, your doctors will call a meeting with your family and very gingerly raise the possibility of going to “comfort care only”, which means they disconnect the machines and stop the treatments and put you on painkillers so that you die peacefully. Your family will start yelling at the doctors, asking how the hell these quacks were ever allowed to practice when for God’s sake they’re trying to kill off Grandma just so they can avoid doing a tiny bit of work. They will demand the doctors find some kind of complicated surgery that will fix all your problems, add on new pills to the thirteen you’re already being force-fed every day, call in the most expensive consultants from Europe, figure out some extraordinary effort that can keep you living another few days.
Robin Hanson sometimes writes about how health care is a form of signaling, trying to spend money to show you care about someone else. I think he’s wrong in the general case – most people pay their own health insurance – but I think he’s spot on in the case of families caring for their elderly relatives. The hospital lawyer mentioned during orientation that it never fails that the family members who live in the area and have spent lots of time with their mother/father/grandparent over the past few years are willing to let them go, but someone from 2000 miles away flies in at the last second and makes ostentatious demands that EVERYTHING POSSIBLE must be done for the patient.
With increased wealth, humans relatively universally become more caring? Is this why billionaires are always giving up the vast majority of their fortunes to feed the hungry and house the homeless while willingly living on rice and beans?
If you donate to AI alignment research, it doesn’t mean that you get to decide which values are loaded. Other people will decide that. You will then be forced to eat the end result, whatever it may look like. Your mistaken assumption is that there is such a thing as “human values”, which will cause a world that is good for human beings in general. In reality, people have their own values, and they include terms for “stopping other people from having what they want”, “making sure my enemies suffer”, “making people regret disagreeing with me”, and so on.
When people talk about “human values” in this context, I think they usually mean something like “goals that are Pareto optimal for the values of individual humans”- and the things you listed definitely aren’t that.
If we are talking about any sort of “optimality”, we can’t expect even individual humans to have these “optimal” values, much less so en masse. Of course it is futile to dream that our deus ex machina will impose those fantastic values on the world if 99% of us de facto disagree with them.
I’m not sure they mean that. Perhaps it would be better to actually specify the specific values you want implemented. But then of course people will disagree, including the actual humans who are trying to build AGI.
What do you believe would happen to a neurotypical forced to have self-awareness and a more accurate model of reality in general?
The idea that they become allistic neurodivergents like me is, of course, a suspicious conclusion, but I’m not sure I see a credible alternative. CEV seems like an inherently neurodivergent idea, in the sense that forcing people (or their extrapolated selves) to engage in analysis is baked into the concept.
I often honestly struggle to see neurotypicals as sane, but I’m hideously misanthropic at times. The problem is, I became the way I am through a combination of childhood trauma and teenage occultism (together with a tendency to be critical of everything), which is a combination that most people don’t have and possibly shouldn’t have; I don’t know how to port my natural appetite for rationality to a “normal” brain.
Exactly your point is what has prevented me from adopting the orthodox LessWrong position. If I knew that in the future Clippy was going to kill me and everyone else, I would consider that a neutral outcome. If, however, I knew that in the future some group of humans were going to successfully align an AGI to their interests, I would be far more worried.
If anyone knows of an Eliezer or SSC-level rebuttal to this, please let me know so that I can read it.
The way I see it, the only correct value to align any AI to is not the arbitrary values of humans-in-general, assuming such a thing even exists, but rather the libertarian principle of self-ownership / non-aggression. The perfect super-AI would have no desire or purpose other than to be the “king” of an anarcho-monarchist world state and rigorously enforce contracts (probably with the aid of human, uplift, etc interpreters, judges, and juries stipulated in the contracts themselves, so that the AI does not have to make decisions about what is reasonable), including a basic social contract, binding on all sentient beings, that if they possess the capacity for moral reasoning, they are required to respect certain fundamental rights of all other sentient beings. (This would include obvious things like not eating them.) It would, essentially, be a sentient law court (and police force, so that it can recognize violations and take appropriate action), in which anything that has consciousness has its rights protected. For a super-AI to be anything other than that is asking for trouble.
It is not obvious at all that ‘AI aligned with its human creators’ is actually better than Clippy. Even AI aligned with human CEV might not beat Clippy. I would much rather die than live forever in a world with untold numbers of tortured ems, suffering subroutines, or other mistreated digital beings.
Few humans are actively sadistic. But most humans are quite indifferent to suffering. The best illustration of this is our attitude toward animals. If there is an economic or ideological reason to torment digital beings we will probably torture them. The future might be radically worse than the present. Some people think that human CEV will be kind to all beings because of the strong preferences of a minority of humans. The humans who care about suffering have strong enough preferences to outweigh small economic incentives. But the world I live in does not make me confident.
I also put non-trivial probability on the possibility that the singularity has already happened and I am already one of the digital beings. This is good news because my life is not currently horrible. But I am definitely afraid I am going to wake up one day and learn I am being sent back into digital hell. At a minimum, I am not at all interested in cryopreservation. I don’t want to end up like MMAcevedo if I can still avoid such a fate.
It’s pretty obvious to me, but then I am a human being. I would like to live in the sort of world that human beings would like to live in.
I don’t particularly blame humans for this world being full of suffering. We didn’t invent parasitoid wasps. But we have certainly not used our current powers very responsibly. We did invent factory farms. And most of us do not particularly care.
I am very afraid more powerful humans/human-aligned beings will invent even worse horrors. And if we tolerate factory farming it seems possible we will tolerate the new horrors. So I cannot be confident that humans gaining more power, even if it was equitably distributed among humans, would actually be a good thing.
I fear this has already happened and I am already at the mercy of those vastly more powerful humans. In that sense, I fear for myself! But even if I am safe I fear for the many beings who are not. We can’t even save the pigs, how are we going to save the ems!
But don’t you share the impression that with increased wealth humans generally care more about the suffering of others? The story I tell myself is that humans have many basic needs (e.g. food, safety, housing) that historically conflicted with ‘higher’ desires like self-expression, helping others or improving the world. And with increased wealth, humans relatively universally become more caring. Or maybe more cynically, with increased wealth we can and do invest more resources into signalling that we are caring good reasonable people, i.e. the kinds of people others will more likely choose as friends/mates/colleagues.
This makes me optimistic about a future in which humans still shape the world. Would be grateful to have some holes poked into this. Holes that spontaneously come to mind:
influence-seeking people are more likely uncaring and/or psychpathic
the signals that humans use for determining who is a caring good person are not strongly correlated with actually caring about reducing suffering in the world
I don’t know how it will all play out in the end. I hope kindness wins and I agree the effect you discuss is real. But it is not obvious that our empathy increases faster than our capacity to do harm. Right now, for each human there are about seven birds/mammals on farms. This is quite the catastrophe. Perhaps that problem will eventually be solved by lab meat. But right now animal product consumption is still going up worldwide. And many worse things can be created and maybe those will endure.
People can be shockingly cruel to their own family. Scott’s Who by Very Slow Decay is one of the scariest things I ever read. How can people do this to their own parents?
With increased wealth, humans relatively universally become more caring? Is this why billionaires are always giving up the vast majority of their fortunes to feed the hungry and house the homeless while willingly living on rice and beans?
If you donate to AI alignment research, it doesn’t mean that you get to decide which values are loaded. Other people will decide that. You will then be forced to eat the end result, whatever it may look like. Your mistaken assumption is that there is such a thing as “human values”, which will cause a world that is good for human beings in general. In reality, people have their own values, and they include terms for “stopping other people from having what they want”, “making sure my enemies suffer”, “making people regret disagreeing with me”, and so on.
When people talk about “human values” in this context, I think they usually mean something like “goals that are Pareto optimal for the values of individual humans”- and the things you listed definitely aren’t that.
If we are talking about any sort of “optimality”, we can’t expect even individual humans to have these “optimal” values, much less so en masse. Of course it is futile to dream that our deus ex machina will impose those fantastic values on the world if 99% of us de facto disagree with them.
I’m not sure they mean that. Perhaps it would be better to actually specify the specific values you want implemented. But then of course people will disagree, including the actual humans who are trying to build AGI.
What do you believe would happen to a neurotypical forced to have self-awareness and a more accurate model of reality in general?
The idea that they become allistic neurodivergents like me is, of course, a suspicious conclusion, but I’m not sure I see a credible alternative. CEV seems like an inherently neurodivergent idea, in the sense that forcing people (or their extrapolated selves) to engage in analysis is baked into the concept.
I often honestly struggle to see neurotypicals as sane, but I’m hideously misanthropic at times. The problem is, I became the way I am through a combination of childhood trauma and teenage occultism (together with a tendency to be critical of everything), which is a combination that most people don’t have and possibly shouldn’t have; I don’t know how to port my natural appetite for rationality to a “normal” brain.
Exactly your point is what has prevented me from adopting the orthodox LessWrong position. If I knew that in the future Clippy was going to kill me and everyone else, I would consider that a neutral outcome. If, however, I knew that in the future some group of humans were going to successfully align an AGI to their interests, I would be far more worried.
If anyone knows of an Eliezer or SSC-level rebuttal to this, please let me know so that I can read it.
The way I see it, the only correct value to align any AI to is not the arbitrary values of humans-in-general, assuming such a thing even exists, but rather the libertarian principle of self-ownership / non-aggression. The perfect super-AI would have no desire or purpose other than to be the “king” of an anarcho-monarchist world state and rigorously enforce contracts (probably with the aid of human, uplift, etc interpreters, judges, and juries stipulated in the contracts themselves, so that the AI does not have to make decisions about what is reasonable), including a basic social contract, binding on all sentient beings, that if they possess the capacity for moral reasoning, they are required to respect certain fundamental rights of all other sentient beings. (This would include obvious things like not eating them.) It would, essentially, be a sentient law court (and police force, so that it can recognize violations and take appropriate action), in which anything that has consciousness has its rights protected. For a super-AI to be anything other than that is asking for trouble.