Ok, so, trying on my understanding of this post: I guess that a smiling face should only reinforce something if it also leads to the “human happiness” goal… (which would be harder to train for).
I think I can see what Hibbard may have been trying for—in feeling that a smiley face might be worth training for as a first-step towards training for the actual, real goal… depending on how training a “real” AI would proceed.
As background, I can compare against training lab rats to perform complicated processes before getting their “reward”. Say you want to teach it to press a certain lever on one side of the cage, then another one on the other side. First you have to teach the rat just to come over to the first side of the cage—and reward it. Then to teach it to press the lever to reward it, then to press the lever, then run over to the other side of the cage… and so on until it must go through the whole dance before the reward appears.
Thus, for lab rats, teaching it simply to recognise the “first step” (whether to run to one side of the cage, or to discriminate successfully between smiley and non-smiley human faces) is an important part of teaching them the whole process.
However… lab rats are stupid and do not, and cannot understand why they are performing this elaborate dance. All they know is the reward.
A smart AI, on the other hand, should be capable of understanding why a smiley face is important… ie it sometimes indicates that the human is happy. That the smiley face isn’t the goal itself, but only a sometime-indicator that the goal might have been achieved.
Hibbard’s method of teaching will simply not lead to that understanding.
In which case, I’m reminded of this post:
http://lesswrong.com/lw/le/lost_purposes/
That a smiley face is only worthwhile if it actually indicates the real end-goal (of humans being happy). Otherwise the smiley face is as worthless as opening the car-door in the absence of chocolate at the supermarket.
...of course, there are still possible pathological cases (eg everyone being fed Soma… (as previously mentioned in comments) or everyone being lobotomised so they really are happy… but it still not being what humans would have chosen… but that’s the subject of teaching the AI more about the “human happiness” goal.
Unless it turns out that happiness isn’t what we would have chosen, either. In which case perhaps discarding the “human happiness” goal and teaching it to adopt a “what humans would have chosen” goal works better?
Unless it turns out that what humans would have chosen involves being fused into glass at the bottoms of smoking craters. In which case perhaps a “what humans ought to have chosen” goal works better?
Except now we’ve gone full circle and are expecting the AI to apply a nonhuman valuation, which is what we rejected in the first place.
I haven’t completely followed the local thinking on this subject yet, but my current approximation of the local best answer goes “Let’s assume that there is a way W for the world to be, such that all humans would prefer W if they were right-thinking enough, including hypothetical future humans living in the world according to W. Further, let’s assume the specifications of W can be determined from a detailed study of humans by a sufficiently intelligent observer. Given those assumptions, we should build a sufficiently intelligent observer whose only goal is to determine W, and then an optimizing system to implement W.”
Hmmm, I can forsee many problems with guessing what humans “ought” to prefer. Even humans have got that one wrong pretty much every time they’ve tried.
I’d say a “better” goal might be cased as “increasing the options available to most humans (not at the expense of the options of other humans)”
This goal seems compatible with allowing humans to choose happier lifestyles—but without forcing them into any particular lifestyle that they may not consider to be “better”.
It would “work” by concentrating on things like extending human lifespans and finding better medical treatments for things that limit human endeavour.
However, this is just a guess… and I am still only a novice here… which means I am in no way capable of figuring out how I’d actually go about training an AI to accept the above goal.
All I know is that I agree with Eliezer’s post that the lab-rat method would be sub-optimal as it has a high propensity to fall into pathological configurations.
Ok, so, trying on my understanding of this post: I guess that a smiling face should only reinforce something if it also leads to the “human happiness” goal… (which would be harder to train for).
I think I can see what Hibbard may have been trying for—in feeling that a smiley face might be worth training for as a first-step towards training for the actual, real goal… depending on how training a “real” AI would proceed.
As background, I can compare against training lab rats to perform complicated processes before getting their “reward”. Say you want to teach it to press a certain lever on one side of the cage, then another one on the other side. First you have to teach the rat just to come over to the first side of the cage—and reward it. Then to teach it to press the lever to reward it, then to press the lever, then run over to the other side of the cage… and so on until it must go through the whole dance before the reward appears.
Thus, for lab rats, teaching it simply to recognise the “first step” (whether to run to one side of the cage, or to discriminate successfully between smiley and non-smiley human faces) is an important part of teaching them the whole process.
However… lab rats are stupid and do not, and cannot understand why they are performing this elaborate dance. All they know is the reward.
A smart AI, on the other hand, should be capable of understanding why a smiley face is important… ie it sometimes indicates that the human is happy. That the smiley face isn’t the goal itself, but only a sometime-indicator that the goal might have been achieved.
Hibbard’s method of teaching will simply not lead to that understanding.
In which case, I’m reminded of this post: http://lesswrong.com/lw/le/lost_purposes/ That a smiley face is only worthwhile if it actually indicates the real end-goal (of humans being happy). Otherwise the smiley face is as worthless as opening the car-door in the absence of chocolate at the supermarket.
...of course, there are still possible pathological cases (eg everyone being fed Soma… (as previously mentioned in comments) or everyone being lobotomised so they really are happy… but it still not being what humans would have chosen… but that’s the subject of teaching the AI more about the “human happiness” goal.
Right.
Unless it turns out that happiness isn’t what we would have chosen, either. In which case perhaps discarding the “human happiness” goal and teaching it to adopt a “what humans would have chosen” goal works better?
Unless it turns out that what humans would have chosen involves being fused into glass at the bottoms of smoking craters. In which case perhaps a “what humans ought to have chosen” goal works better?
Except now we’ve gone full circle and are expecting the AI to apply a nonhuman valuation, which is what we rejected in the first place.
I haven’t completely followed the local thinking on this subject yet, but my current approximation of the local best answer goes “Let’s assume that there is a way W for the world to be, such that all humans would prefer W if they were right-thinking enough, including hypothetical future humans living in the world according to W. Further, let’s assume the specifications of W can be determined from a detailed study of humans by a sufficiently intelligent observer. Given those assumptions, we should build a sufficiently intelligent observer whose only goal is to determine W, and then an optimizing system to implement W.”
Hmmm, I can forsee many problems with guessing what humans “ought” to prefer. Even humans have got that one wrong pretty much every time they’ve tried.
I’d say a “better” goal might be cased as “increasing the options available to most humans (not at the expense of the options of other humans)”
This goal seems compatible with allowing humans to choose happier lifestyles—but without forcing them into any particular lifestyle that they may not consider to be “better”.
It would “work” by concentrating on things like extending human lifespans and finding better medical treatments for things that limit human endeavour.
However, this is just a guess… and I am still only a novice here… which means I am in no way capable of figuring out how I’d actually go about training an AI to accept the above goal.
All I know is that I agree with Eliezer’s post that the lab-rat method would be sub-optimal as it has a high propensity to fall into pathological configurations.