The first problem is the wish would have to be extremely good at pointing.
This sounds silly but what I mean is that humans are COMPLICATED. “Pointing” at a human and telling an AI to deduce things about it will come up with HUGE swathes of data which you have to have already prepared it to ignore or pay attention to. To give a classic simple example, smiles are a sign of happiness but we do not want to tile the universe in smiley faces or create an artificial virus that constricts your face into a rictus and is highly contagious.
Second: assuming that works, it works primarily for one person, which is giving that person a lot more power than I think most people want to give any one person. But if we could guarantee an AI would fulfill the values of A person rather than of multiple people and someone else was developing AI that wasn’t guarunteed to fulfill any values I’d probably take it.
To spell out some of the complications—does the genie only respond to verbal commands? What if the human is temporarily angry at someone and an internal part of their brain wishes them harm. The genie needs to know not to act on this. So it must have some kind of requirement for reflective equilibrium.
Suppose the human is duped into pursuing some unwise course of action? The genie needs to reject their new wishes. But the human should still be able to have their morality evolve over time.
So you still need a complete CV Extrapolator. But maybe that’s what you had in mind be pointing at the wishes of a particular human?
I’m not talking naive obedient AI here. I’m talking a much less meta FAI that does not do analysis of metaethics or CEV or do incredibly vague, subtle wishes. (Atlantis in HPMOR may be an example of a very weak, rather irrational, poorly safeguarded Obedient AI with a very, very strange command set.)
For what it’s worth, Eliezer’s answer to your second question is here:
Is that true? Why can’t the wish point at what it wants (e.g. the wishes of particular human X) - rather than spelling it out in detail?
The first problem is the wish would have to be extremely good at pointing.
This sounds silly but what I mean is that humans are COMPLICATED. “Pointing” at a human and telling an AI to deduce things about it will come up with HUGE swathes of data which you have to have already prepared it to ignore or pay attention to. To give a classic simple example, smiles are a sign of happiness but we do not want to tile the universe in smiley faces or create an artificial virus that constricts your face into a rictus and is highly contagious.
Second: assuming that works, it works primarily for one person, which is giving that person a lot more power than I think most people want to give any one person. But if we could guarantee an AI would fulfill the values of A person rather than of multiple people and someone else was developing AI that wasn’t guarunteed to fulfill any values I’d probably take it.
To spell out some of the complications—does the genie only respond to verbal commands? What if the human is temporarily angry at someone and an internal part of their brain wishes them harm. The genie needs to know not to act on this. So it must have some kind of requirement for reflective equilibrium.
Suppose the human is duped into pursuing some unwise course of action? The genie needs to reject their new wishes. But the human should still be able to have their morality evolve over time.
So you still need a complete CV Extrapolator. But maybe that’s what you had in mind be pointing at the wishes of a particular human?
I think that Obedient AI requires less fragility-of-values types of things.
I don’t see why a genie can’t kill you just as hard by missing one dimension of what it meant to satisfy your wish.
I’m not talking naive obedient AI here. I’m talking a much less meta FAI that does not do analysis of metaethics or CEV or do incredibly vague, subtle wishes. (Atlantis in HPMOR may be an example of a very weak, rather irrational, poorly safeguarded Obedient AI with a very, very strange command set.)