Human beings are not aligned and will possibly never be aligned without changing what humans are. If it’s possible to build an AI as capable as a human in all ways that matter, why would it be possible to align such an AI?
Because we’re building the AI from the ground up and can change what the AI is via our design choices. Humans’ goal functions are basically decided by genetic accident, which is why humans are often counterproductive.
Assuming humans can’t be “aligned”, then it would also make sense to allocate resources in an attempt to prevent one of them from becoming much more powerful than all of the rest of us.
Not aligned on values, beliefs and moral intuitions. Plenty of humans would not kill all people alive if given the choice but there are some who would. I think the existence of doomsday cults that have tried to precipitate an armageddon give support to this claim.
Ah, so you mean that humans are not perfectly aligned with each other? I was going by the definition of “aligned” in Eliezer’s “AGI ruin” post, which was
I am not talking about ideal or perfect goals of ‘provable’ alignment, nor total alignment of superintelligences on exact human values, nor getting AIs to produce satisfactory arguments about moral dilemmas which sorta-reasonable humans disagree about, nor attaining an absolute certainty of an AI not killing everyone. When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1” is an overly large ask that we are not on course to get.
Likewise, in an earlier paper I mentioned that by an AGI that “respects human values”, we don’t mean to imply that current human values would be ideal or static. We just mean that we hope to at least figure out how to build an AGI that does not, say, destroy all of humanity, cause vast amounts of unnecessary suffering, or forcibly reprogram everyone’s brains according to its own wishes.
A lot of discussion about alignment takes this as the minimum goal. Figuring out what to do with humans having differing values and beliefs would be great, but if we could even get the AGI to not get us into outcomes that the vast majority of humans would agree are horrible, that’d be enormously better than the opposite. And there do seem to exist humans who are aligned in this sense of “would not do things that the vast majority of other humans would find horrible, if put in control of the whole world”; even if some would, the fact that some wouldn’t suggests that it’s also possible for some AIs not to do it.
Most of what people call morality is conflict mediation: techniques for taking the conflicting desires of various parties and producing better outcomes for them than war. That’s how I’ve always thought of the alignment problem. The creation of a very very good compromise that almost all of humanity will enjoy.
There’s no obvious best solution to value aggregation/cooperative bargaining, but there are a couple of approaches that’re obviously better than just having an arms race, rushing the work, and producing something awful that’s nowhere near the average human preference.
Indeed humans are significantly non-aligned. In order for an ASI to be non-catastrophic, it would likely have to be substantially more aligned than humans are. This is probably less-than-impossible due to the fact that the AI can be built from the get-go to be aligned, rather than being a bunch of barely-coherent odds and ends thrown together by natural selection.
Of course, reaching that level of alignedness remains a very hard task, hence the whole AI alignment problem.
I’m not quite sure what this means. As I understand it humans are not aligned with evolution’s implicit goal of “maximizing genetic fitness” but humans are (definitionally) aligned with human values. And e.g. many humans are aligned with core values like “treat others with dignity”.
Importantly, capability and alignment are sort of orthogonal. The consequences of misaligned AI get worse the more capable it is, but it seems possible to have aligned superhuman AI, as well as horribly misaligned weak AI.
It is not definitionally true that individual humans are aligned with overall human values or with other individual humans’ values. Further, it is proverbial (and quite possibly actually true as well) that getting a lot of power tends to make humans less aligned with those things. “Power corrupts; absolute power corrupts absolutely.”
I don’t know whether it’s true, but it sure seems like it might be, that the great majority of humans, if you gave them vast amounts of power, would end up doing disastrous things with it. On the other hand, probably only a tiny minority would actually wipe out the human race or torture almost everyone or commit other such atrocities, which makes humans more aligned than e.g. Eliezer expects AIs to be in the absence of dramatic progress in the field of AI alignment.
I think a substantial part of human alignment is that humans need other humans in order to maintain their power. We have plenty of examples of humans being fine with torturing or killing millions of other humans when they have the power to do so, but torturing or killing almost all humans in their sphere of control is essentially suicide. This means that purely instrumentally, human goals have required that large numbers of humans continue to exist and function moderately well.
A superintelligent AI is primarily a threat due to the near certainty that it can devise means for maintaining power that are independent of human existence. Humans can’t do that by definition, and not due to anything about alignment.
Okay, so… does anyone have any examples of anything at all, even fictional or theoretical, that is “aligned”? Other than tautological examples like “FAI” or “God”.
This. Combine this fact with the non-trivial chance that moral values are subjective, not objective, and there is little good reason to be doing alignment.
While human moral values are subjective, there is a sufficiently large shared amount that you can target at aligning an AI to that. As well, values held by a majority (ex: caring for other humans, enjoying certain fun things) are also essentially shared. Values that are held by smaller groups can also be catered to.
If humans were sampled from the entire space of possible values, then yes we (maybe) couldn’t build an AI aligned to humanity, but we only take up a relatively small space and have a lot of shared values.
Not really, I do want to make an AGI, primarily because I have very much the want to have a singularity, as it represents hope to me, and I have very different priors than Eliezer or MIRI about how much we’re doomed.
So you think that, since morals are subjective, there is no reason to try to make an effort to control what happens after the singularity? I really don’t see how that follows.
Human beings are not aligned and will possibly never be aligned without changing what humans are. If it’s possible to build an AI as capable as a human in all ways that matter, why would it be possible to align such an AI?
Because we’re building the AI from the ground up and can change what the AI is via our design choices. Humans’ goal functions are basically decided by genetic accident, which is why humans are often counterproductive.
Assuming humans can’t be “aligned”, then it would also make sense to allocate resources in an attempt to prevent one of them from becoming much more powerful than all of the rest of us.
Define “not aligned”? For instance, there are plenty of humans who, given the choice, would rather not kill every single person alive.
Not aligned on values, beliefs and moral intuitions. Plenty of humans would not kill all people alive if given the choice but there are some who would. I think the existence of doomsday cults that have tried to precipitate an armageddon give support to this claim.
Ah, so you mean that humans are not perfectly aligned with each other? I was going by the definition of “aligned” in Eliezer’s “AGI ruin” post, which was
Likewise, in an earlier paper I mentioned that by an AGI that “respects human values”, we don’t mean to imply that current human values would be ideal or static. We just mean that we hope to at least figure out how to build an AGI that does not, say, destroy all of humanity, cause vast amounts of unnecessary suffering, or forcibly reprogram everyone’s brains according to its own wishes.
A lot of discussion about alignment takes this as the minimum goal. Figuring out what to do with humans having differing values and beliefs would be great, but if we could even get the AGI to not get us into outcomes that the vast majority of humans would agree are horrible, that’d be enormously better than the opposite. And there do seem to exist humans who are aligned in this sense of “would not do things that the vast majority of other humans would find horrible, if put in control of the whole world”; even if some would, the fact that some wouldn’t suggests that it’s also possible for some AIs not to do it.
Most of what people call morality is conflict mediation: techniques for taking the conflicting desires of various parties and producing better outcomes for them than war.
That’s how I’ve always thought of the alignment problem. The creation of a very very good compromise that almost all of humanity will enjoy.
There’s no obvious best solution to value aggregation/cooperative bargaining, but there are a couple of approaches that’re obviously better than just having an arms race, rushing the work, and producing something awful that’s nowhere near the average human preference.
Indeed humans are significantly non-aligned. In order for an ASI to be non-catastrophic, it would likely have to be substantially more aligned than humans are. This is probably less-than-impossible due to the fact that the AI can be built from the get-go to be aligned, rather than being a bunch of barely-coherent odds and ends thrown together by natural selection.
Of course, reaching that level of alignedness remains a very hard task, hence the whole AI alignment problem.
I’m not quite sure what this means. As I understand it humans are not aligned with evolution’s implicit goal of “maximizing genetic fitness” but humans are (definitionally) aligned with human values. And e.g. many humans are aligned with core values like “treat others with dignity”.
Importantly, capability and alignment are sort of orthogonal. The consequences of misaligned AI get worse the more capable it is, but it seems possible to have aligned superhuman AI, as well as horribly misaligned weak AI.
It is not definitionally true that individual humans are aligned with overall human values or with other individual humans’ values. Further, it is proverbial (and quite possibly actually true as well) that getting a lot of power tends to make humans less aligned with those things. “Power corrupts; absolute power corrupts absolutely.”
I don’t know whether it’s true, but it sure seems like it might be, that the great majority of humans, if you gave them vast amounts of power, would end up doing disastrous things with it. On the other hand, probably only a tiny minority would actually wipe out the human race or torture almost everyone or commit other such atrocities, which makes humans more aligned than e.g. Eliezer expects AIs to be in the absence of dramatic progress in the field of AI alignment.
I think a substantial part of human alignment is that humans need other humans in order to maintain their power. We have plenty of examples of humans being fine with torturing or killing millions of other humans when they have the power to do so, but torturing or killing almost all humans in their sphere of control is essentially suicide. This means that purely instrumentally, human goals have required that large numbers of humans continue to exist and function moderately well.
A superintelligent AI is primarily a threat due to the near certainty that it can devise means for maintaining power that are independent of human existence. Humans can’t do that by definition, and not due to anything about alignment.
Okay, so… does anyone have any examples of anything at all, even fictional or theoretical, that is “aligned”? Other than tautological examples like “FAI” or “God”.
This. Combine this fact with the non-trivial chance that moral values are subjective, not objective, and there is little good reason to be doing alignment.
While human moral values are subjective, there is a sufficiently large shared amount that you can target at aligning an AI to that. As well, values held by a majority (ex: caring for other humans, enjoying certain fun things) are also essentially shared. Values that are held by smaller groups can also be catered to.
If humans were sampled from the entire space of possible values, then yes we (maybe) couldn’t build an AI aligned to humanity, but we only take up a relatively small space and have a lot of shared values.
So do you think that instead we should just be trying to not make an AGI at all?
Not really, I do want to make an AGI, primarily because I have very much the want to have a singularity, as it represents hope to me, and I have very different priors than Eliezer or MIRI about how much we’re doomed.
So you think that, since morals are subjective, there is no reason to try to make an effort to control what happens after the singularity? I really don’t see how that follows.