Ok, what about preferences? Is it correct to call the preference “a probability distribution of expected human choices”? For example, my preference is 70 percent to take coffee and 30 percent to take tea at breakfast.
>Is it correct to call the preference “a probability distribution of expected human choices”
No, because the assumption of irrationality means that preferences don’t match up with choices. Preferences are rankings of possible worlds/rewards/outcomes on an ordinal and cardinal scale. The challenge is to infer these preference from human behaviour.
If preferences will be equal to choices, then predicting preferences will be predicting future choice which may be relatively simple task of extrapolation of the past behaviour, and it could be computed without assuming existence of two parts of the human mind: constant preferences and noise.
Yes, but this happens only because of the way we define preferences, imho. We define preferences as purely rational part, then compare this definition with actual humans, and see that there is also another irrational part.
Example: the same way we could say: every human being is six feet high, plus minus some noise variable. This may be useful way to describe humans, but it has obvious limitations.
What I suggest to do, is to look why we decided that humans have values or preferences at all? It is idea which appeared somewhere in 20 century psychology or philosophy, and it is only one of several ways to describe humans behaviour.
I want to construct/extract/extrapolate/define human preferences (or make a human reward/utility function), in order to have something we can give AI as a goal. Whether we count this as defining or extrapolating doesn’t really matter; it’s the result that’s important.
One of the things that gives me hope is that actual humans overlap considerably in their judgement of what is rational and irrational. Almost everyone agrees that the anchoring bias is bias, not a preference; almost everyone agrees that people are less rational when drunk (with the caveat that drunkeness can also suppress certain other irrationalities, like social phobia—but again, that more complicated story is also something that people tend to agree on).
And values, and debates over values, date back at least to tribal times; dehumanising foreigners was based a lot around their strange values and untrustrworthiness.
I understand it and I think it is important project.
I will try to write something in next couple of months where I will check another approach: is it possible to describe AI-human positive relations without extracting or extrapolating values at all. For now I have some gut feeling that it could be interesting point of view, but I am not ready to formalize it.
Thanks for links. By the way, do we have a definition of “human value” about which we agree?
>do we have a definition of “human value” about which we agree?
Of course not; that would make things far too easy! :-)
Though in https://www.lesswrong.com/posts/weHuX2qkTxgAXBw8t/defining-the-ways-human-values-are-messy , I define human values as preferences (which is a lot clearer), with the distinction between values and more normal preferences being due to a human meta-preference.
Ok, what about preferences? Is it correct to call the preference “a probability distribution of expected human choices”? For example, my preference is 70 percent to take coffee and 30 percent to take tea at breakfast.
>Is it correct to call the preference “a probability distribution of expected human choices”
No, because the assumption of irrationality means that preferences don’t match up with choices. Preferences are rankings of possible worlds/rewards/outcomes on an ordinal and cardinal scale. The challenge is to infer these preference from human behaviour.
If preferences will be equal to choices, then predicting preferences will be predicting future choice which may be relatively simple task of extrapolation of the past behaviour, and it could be computed without assuming existence of two parts of the human mind: constant preferences and noise.
>If preferences will be equal to choices
Unless you are arguing that humans are fully rational in every decision they ever make, this is not the case.
Yes, but this happens only because of the way we define preferences, imho. We define preferences as purely rational part, then compare this definition with actual humans, and see that there is also another irrational part.
Example: the same way we could say: every human being is six feet high, plus minus some noise variable. This may be useful way to describe humans, but it has obvious limitations.
What I suggest to do, is to look why we decided that humans have values or preferences at all? It is idea which appeared somewhere in 20 century psychology or philosophy, and it is only one of several ways to describe humans behaviour.
I want to construct/extract/extrapolate/define human preferences (or make a human reward/utility function), in order to have something we can give AI as a goal. Whether we count this as defining or extrapolating doesn’t really matter; it’s the result that’s important.
One of the things that gives me hope is that actual humans overlap considerably in their judgement of what is rational and irrational. Almost everyone agrees that the anchoring bias is bias, not a preference; almost everyone agrees that people are less rational when drunk (with the caveat that drunkeness can also suppress certain other irrationalities, like social phobia—but again, that more complicated story is also something that people tend to agree on).
And values, and debates over values, date back at least to tribal times; dehumanising foreigners was based a lot around their strange values and untrustrworthiness.
I understand it and I think it is important project.
I will try to write something in next couple of months where I will check another approach: is it possible to describe AI-human positive relations without extracting or extrapolating values at all. For now I have some gut feeling that it could be interesting point of view, but I am not ready to formalize it.
Good luck with that! I’m skeptical of that approach, but it would be lovely if it could be worked out...