Unless it turns out that happiness isn’t what we would have chosen, either. In which case perhaps discarding the “human happiness” goal and teaching it to adopt a “what humans would have chosen” goal works better?
Unless it turns out that what humans would have chosen involves being fused into glass at the bottoms of smoking craters. In which case perhaps a “what humans ought to have chosen” goal works better?
Except now we’ve gone full circle and are expecting the AI to apply a nonhuman valuation, which is what we rejected in the first place.
I haven’t completely followed the local thinking on this subject yet, but my current approximation of the local best answer goes “Let’s assume that there is a way W for the world to be, such that all humans would prefer W if they were right-thinking enough, including hypothetical future humans living in the world according to W. Further, let’s assume the specifications of W can be determined from a detailed study of humans by a sufficiently intelligent observer. Given those assumptions, we should build a sufficiently intelligent observer whose only goal is to determine W, and then an optimizing system to implement W.”
Hmmm, I can forsee many problems with guessing what humans “ought” to prefer. Even humans have got that one wrong pretty much every time they’ve tried.
I’d say a “better” goal might be cased as “increasing the options available to most humans (not at the expense of the options of other humans)”
This goal seems compatible with allowing humans to choose happier lifestyles—but without forcing them into any particular lifestyle that they may not consider to be “better”.
It would “work” by concentrating on things like extending human lifespans and finding better medical treatments for things that limit human endeavour.
However, this is just a guess… and I am still only a novice here… which means I am in no way capable of figuring out how I’d actually go about training an AI to accept the above goal.
All I know is that I agree with Eliezer’s post that the lab-rat method would be sub-optimal as it has a high propensity to fall into pathological configurations.
Right.
Unless it turns out that happiness isn’t what we would have chosen, either. In which case perhaps discarding the “human happiness” goal and teaching it to adopt a “what humans would have chosen” goal works better?
Unless it turns out that what humans would have chosen involves being fused into glass at the bottoms of smoking craters. In which case perhaps a “what humans ought to have chosen” goal works better?
Except now we’ve gone full circle and are expecting the AI to apply a nonhuman valuation, which is what we rejected in the first place.
I haven’t completely followed the local thinking on this subject yet, but my current approximation of the local best answer goes “Let’s assume that there is a way W for the world to be, such that all humans would prefer W if they were right-thinking enough, including hypothetical future humans living in the world according to W. Further, let’s assume the specifications of W can be determined from a detailed study of humans by a sufficiently intelligent observer. Given those assumptions, we should build a sufficiently intelligent observer whose only goal is to determine W, and then an optimizing system to implement W.”
Hmmm, I can forsee many problems with guessing what humans “ought” to prefer. Even humans have got that one wrong pretty much every time they’ve tried.
I’d say a “better” goal might be cased as “increasing the options available to most humans (not at the expense of the options of other humans)”
This goal seems compatible with allowing humans to choose happier lifestyles—but without forcing them into any particular lifestyle that they may not consider to be “better”.
It would “work” by concentrating on things like extending human lifespans and finding better medical treatments for things that limit human endeavour.
However, this is just a guess… and I am still only a novice here… which means I am in no way capable of figuring out how I’d actually go about training an AI to accept the above goal.
All I know is that I agree with Eliezer’s post that the lab-rat method would be sub-optimal as it has a high propensity to fall into pathological configurations.