Re: Instilling chosen desires in artificial intelligences is the major difficulty of FAI.
That is not what I regularly hear. Instead people go on about how complicated human values are, and how reverse engineering them is so difficult, and how programming them into a machine looks like a nightmare—even once we identify them.
I assume that we will be able to program simple desires into a machine—at least to the extent of making a machine that will want to turn itself off. We regularly instill simple desires into chess computers and the like. It does not look that tricky.
Re: “If you haven’t actually given it a utility function which will cause it to auto-shutdown”
Then that is a whole different ball game to what I was talking about.
Re: “The are in an excellent position to instill values upon that intelligence”
...but the point is that instilling the desire for appropriate stopping behaviour is likely to be much simpler than trying to instill all human values—and yet it is pretty effective at eliminating the spectre of a runaway superintelligence.
The point about the complexity of human value is that any small variation will result in a valueless world. The point is that a randomly chosen utility function, or one derived from some simple task is not going to produce the sort of behavior we want. Or to put it more succinctly, Friendliness doesn’t happen without hard work. This doesn’t mean that the hardest sub-goal on the way to Friendliness is figuring out what humans want, although Eliezer’s current plan is to sidestep that whole issue.
Okay, the structure of that sentence and the next (“the point is.… the point is....”) made me think you might have made a typo. (I’m still a little confused, since I don’t see how small changes are relevant to anything Tim Tyler mentioned.)
I strongly doubt that literally any small change would result in a literally valueless world.
Re: Instilling chosen desires in artificial intelligences is the major difficulty of FAI.
That is not what I regularly hear. Instead people go on about how complicated human values are, and how reverse engineering them is so difficult, and how programming them into a machine looks like a nightmare—even once we identify them.
I assume that we will be able to program simple desires into a machine—at least to the extent of making a machine that will want to turn itself off. We regularly instill simple desires into chess computers and the like. It does not look that tricky.
Re: “If you haven’t actually given it a utility function which will cause it to auto-shutdown”
Then that is a whole different ball game to what I was talking about.
Re: “The are in an excellent position to instill values upon that intelligence”
...but the point is that instilling the desire for appropriate stopping behaviour is likely to be much simpler than trying to instill all human values—and yet it is pretty effective at eliminating the spectre of a runaway superintelligence.
The point about the complexity of human value is that any small variation will result in a valueless world. The point is that a randomly chosen utility function, or one derived from some simple task is not going to produce the sort of behavior we want. Or to put it more succinctly, Friendliness doesn’t happen without hard work. This doesn’t mean that the hardest sub-goal on the way to Friendliness is figuring out what humans want, although Eliezer’s current plan is to sidestep that whole issue.
s/is/isn’t/ ?
Fairly small changes would result is boring, valueless futures.
Okay, the structure of that sentence and the next (“the point is.… the point is....”) made me think you might have made a typo. (I’m still a little confused, since I don’t see how small changes are relevant to anything Tim Tyler mentioned.)
I strongly doubt that literally any small change would result in a literally valueless world.
People who suggest that a given change in preference isn’t going to be significant are usually talking about changes that are morally fatal.
This is probably true; I’m talking about the literal universally quantified statement.
I would have cited Value is Fragile to support this point.
That’s also good.