Inspired by a paragraph from the document “Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures”, by Eliezer Yudkowsky:
whether or not it’s possible for Friendliness programmers to create
Friendship content that says, “Be Friendly towards humans/humanity, for the rest
of eternity, if and only if people are still kind to you while you’re infrahuman or nearhuman,”
it’s difficult to see why this would be easier than creating unconditional Friendship
content that says “Be Friendly towards humanity.”
It might not be easier, but it is possible that there could be consequences to how ‘fair’ a solution the constraint appears to be, given the problem it is intended to solve.
How so? The AI won’t care about fairness unless that fits its programmed goals (which should be felt, if at all, as a drive more than a restraint). Now if we tell it to care about our extrapolated values, and extrapolation says we’d consider the AI a person, then it will likely want to be fair to itself. That’s why we don’t want to make it a person.
I don’t know how to solve it, aside from including some approximation of Bayesian updating as a necessary condition. (Goetz or someone once pointed out another one, but again it didn’t seem useful on its own. Hopefully we can combine a lot of these conditions, and if the negation still seems too strict to serve our purposes, we might possibly have a non-person AI as defined by this predicate bootstrap its way to a better non-person predicate.) I hold out hope for a solution because, intuitively, it seems possible to imagine people without making them conscious (and Eliezer points out that this part may be harder than a single non-person AGI). Oh, and effectively defining some aspects of consciousness seems necessary for judging models of the world without using Cartesian dualism.
But let’s say we can’t solve non-sentient AGI. Let’s further say that humanity is not an abomination we can only address by killing everyone, although in this hypothetical we may be creating people in pain whenever we imagine them.
Since the AGI doesn’t exist yet—and if we made one with the desire to serve us, we want to prove it wouldn’t change that desire—how do you define “being fair” to the potential of linear regression software? What about the countless potential humans we exclude from our timeline with every action?
Empirically, we’re killing the apes. (And by the way, that seems like a much better source of concern when it comes to alien judgment. Though the time for concern may have passed with the visible Neanderthals.) If Dr. Zaius goes back and tells them they could create a different “human race” with the desire to not do that, only a fool of an ape would refuse. And I don’t believe in any decision theory that says otherwise.
Empirically, we’re killing the apes. (And by the way, that seems like a much better source of concern when it comes to alien judgment. Though the time for concern may have passed with the visible Neanderthals.) If Dr. Zaius goes back and tells them they could create a different “human race” with the desire to not do that, only a fool of an ape would refuse. And I don’t believe in any decision theory that says otherwise.
I agree.
The question is: are there different constraints that would, either as a side effect, or as a primary objective, achieve the end of avoiding humanity wiping out the apes
And, if so, are there other considerations we should be taking into account when picking which constraint to use?
how do you define “being fair” to the potential of linear regression software?
That’s a big question. How much of the galaxy (or even universe) does humanity ‘deserve’ to control, compared to any other species that might be out there, or any other species that we create?
I don’t know how many answers there are that lie somewhere between “Grab it all for ourselves, if we’re able!” and “Foolishly give away what we could have grabbed, endangering ourselves.”. But I’m pretty sure the two endpoints are not the only two options.
Luckily for me, in this discussion, I don’t have to pick a precise option and say “This! This is the fair one.” I just have to demonstrate the plausibility of there being at least one option that is unfair OR that might be seen as being unfair by some group who, on that basis, would then be willing and able to take action influencing the course of humanity’s future.
Because if I can demonstrate that, then how ‘fair’ the constraint is, does become a factor that should be taken into account.
Inspired by a paragraph from the document “Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures”, by Eliezer Yudkowsky:
It might not be easier, but it is possible that there could be consequences to how ‘fair’ a solution the constraint appears to be, given the problem it is intended to solve.
How so? The AI won’t care about fairness unless that fits its programmed goals (which should be felt, if at all, as a drive more than a restraint). Now if we tell it to care about our extrapolated values, and extrapolation says we’d consider the AI a person, then it will likely want to be fair to itself. That’s why we don’t want to make it a person.
Aliens might care if we’ve been fair to a sentient species.
Other humans might care.
Our descendants might care.
I’m not saying those considerations should outweigh the safety factor. But it seems to be a discussion that isn’t yet even being had.
I repeat: this is why we don’t want to create a person, or even a sentient process, if we can avoid it.
How do you define “person”? How do you define “sentient”?
And, more to the point, how can you be sure how an alien race might define such concepts, or even a PETA member?
I don’t know how to solve it, aside from including some approximation of Bayesian updating as a necessary condition. (Goetz or someone once pointed out another one, but again it didn’t seem useful on its own. Hopefully we can combine a lot of these conditions, and if the negation still seems too strict to serve our purposes, we might possibly have a non-person AI as defined by this predicate bootstrap its way to a better non-person predicate.) I hold out hope for a solution because, intuitively, it seems possible to imagine people without making them conscious (and Eliezer points out that this part may be harder than a single non-person AGI). Oh, and effectively defining some aspects of consciousness seems necessary for judging models of the world without using Cartesian dualism.
But let’s say we can’t solve non-sentient AGI. Let’s further say that humanity is not an abomination we can only address by killing everyone, although in this hypothetical we may be creating people in pain whenever we imagine them.
Since the AGI doesn’t exist yet—and if we made one with the desire to serve us, we want to prove it wouldn’t change that desire—how do you define “being fair” to the potential of linear regression software? What about the countless potential humans we exclude from our timeline with every action?
Empirically, we’re killing the apes. (And by the way, that seems like a much better source of concern when it comes to alien judgment. Though the time for concern may have passed with the visible Neanderthals.) If Dr. Zaius goes back and tells them they could create a different “human race” with the desire to not do that, only a fool of an ape would refuse. And I don’t believe in any decision theory that says otherwise.
I agree.
The question is: are there different constraints that would, either as a side effect, or as a primary objective, achieve the end of avoiding humanity wiping out the apes
And, if so, are there other considerations we should be taking into account when picking which constraint to use?
That’s a big question. How much of the galaxy (or even universe) does humanity ‘deserve’ to control, compared to any other species that might be out there, or any other species that we create?
I don’t know how many answers there are that lie somewhere between “Grab it all for ourselves, if we’re able!” and “Foolishly give away what we could have grabbed, endangering ourselves.”. But I’m pretty sure the two endpoints are not the only two options.
Luckily for me, in this discussion, I don’t have to pick a precise option and say “This! This is the fair one.” I just have to demonstrate the plausibility of there being at least one option that is unfair OR that might be seen as being unfair by some group who, on that basis, would then be willing and able to take action influencing the course of humanity’s future.
Because if I can demonstrate that, then how ‘fair’ the constraint is, does become a factor that should be taken into account.