Victor Novikov answers What would the creation of aligned AGI look like for us?

Victor Novikov 11 Apr 2022 7:08 UTC
3 points
This is a really good question!
Some of this is covered in the Fun Theory sequence, I believe.
In short: do you want to be wireheaded by FAI?
And if not, why would you assume it would do that, then?
Do you want to have no choice in the kind of life you live in utopia?
And if not, why do you assume FAI would leave you no choice?
I suppose you might think the extrapolated version of your values or of humanity’s values would want these things. But I strongly disagree. I don’t think it is the case at all.
If the utopia you are imagining sounds horrible to you, let’s not do that. Let’s figure out how to build a better utopia.
Does anyone understand what would happen if you gave an AGI human values and told it to extrapolate from there?
There are many possible ways to extrapolate from human values. How do we figure out which one we prefer? I’m not sure this problem has been completely solved yet. (have you read Eliezer’s 2004 paper on CEV? It makes an attempt at solving it).
But I do think it’s kinda of obvious that most humans would prefer not to be wireheaded? That we have other desires than to be reduced to a thoughtless entity feeling nothing but pleasure? That we, in fact, would find the idea of it repulsive and horrible?
Is there a CEV from human values that results in choice or life that has an analogue to a good life right now?
I believe, almost certainly, that the answer is yes.
- Perhaps 11 Apr 2022 16:32 UTC
  1 point
  Parent
  Thanks for the answer! As you suspected, I don’t think wireheading is a good thing, but after reading about infinite ethics and the repugnant conclusion I’m not entirely sure that there exists a stable mathematically expressible form of ethics we could give to an AGI. Obviously I think it’s possible if you specify exactly what you want and tell the AGI not to extrapolate. However I feel that realistically, it’s going to take our ethics and take it to its logical end, and there exists no ethical theory that really expresses how utility should be valued without causing paradoxes or problems we can’t solve. Unless we manage to build AGI using an evolutionary method to mimick human evolution, I believe that any training or theory given to it would subtly fail.