No, I’m pointing out that a purely negative definition isn’t actually a useful definition that describes the thing the label is supposed to be pointing at. How does one work toward a negative? We can say a few things it isn’t—what is it?
The term “Friendly AI” refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals.
That isn’t a “purely negative” definition in the first place.
Even if it was—would you object to the definition of “hole” on similar grounds?
What exactly is wrong with defining some things in terms of what they are not?
It I say a “safe car” is one that doesn’t kill or hurt people, that seems just fine to me.
The word “artificial” there makes it look like it means more than it does. And humans are just as made of atoms. Let’s try it without that:
The term “friendly intelligence” refers to the production of human-benefiting, non-human-harming actions in intelligences that have advanced to the point of making real-world plans in pursuit of goals.
It’s only described in terms of its effects, and then only vaguely. We have no idea what it would actually be. The CEV plan doesn’t include what it would actually be, it just includes a technological magic step where it’s worked out.
This may be better than nothing, but it’s not enough to say it’s talking about anything that’s actually understood in even the vaguest terms.
For an analogy, what would a gorilla-friendly human-level intelligence be like? How would you reasonably make sure it wasn’t harmful to the future of gorillas? (Humans out the box do pretty badly at this.) What steps would the human take to ascertain the CEV of gorillas, assuming tremendous technological resources?
No, I’m pointing out that a purely negative definition isn’t actually a useful definition that describes the thing the label is supposed to be pointing at. How does one work toward a negative? We can say a few things it isn’t—what is it?
Yudkowsky says:
That isn’t a “purely negative” definition in the first place.
Even if it was—would you object to the definition of “hole” on similar grounds?
What exactly is wrong with defining some things in terms of what they are not?
It I say a “safe car” is one that doesn’t kill or hurt people, that seems just fine to me.
The word “artificial” there makes it look like it means more than it does. And humans are just as made of atoms. Let’s try it without that:
It’s only described in terms of its effects, and then only vaguely. We have no idea what it would actually be. The CEV plan doesn’t include what it would actually be, it just includes a technological magic step where it’s worked out.
This may be better than nothing, but it’s not enough to say it’s talking about anything that’s actually understood in even the vaguest terms.
For an analogy, what would a gorilla-friendly human-level intelligence be like? How would you reasonably make sure it wasn’t harmful to the future of gorillas? (Humans out the box do pretty badly at this.) What steps would the human take to ascertain the CEV of gorillas, assuming tremendous technological resources?
We can’t answer the “how can you do this?” questions today. If we could we would be done.
It’s true that CEV is an 8-year old, moon-onna-stick wishlist—apparently created without much thought about to how to implement it. C’est la vie.