Yes, but this happens only because of the way we define preferences, imho. We define preferences as purely rational part, then compare this definition with actual humans, and see that there is also another irrational part.
Example: the same way we could say: every human being is six feet high, plus minus some noise variable. This may be useful way to describe humans, but it has obvious limitations.
What I suggest to do, is to look why we decided that humans have values or preferences at all? It is idea which appeared somewhere in 20 century psychology or philosophy, and it is only one of several ways to describe humans behaviour.
I want to construct/extract/extrapolate/define human preferences (or make a human reward/utility function), in order to have something we can give AI as a goal. Whether we count this as defining or extrapolating doesn’t really matter; it’s the result that’s important.
One of the things that gives me hope is that actual humans overlap considerably in their judgement of what is rational and irrational. Almost everyone agrees that the anchoring bias is bias, not a preference; almost everyone agrees that people are less rational when drunk (with the caveat that drunkeness can also suppress certain other irrationalities, like social phobia—but again, that more complicated story is also something that people tend to agree on).
And values, and debates over values, date back at least to tribal times; dehumanising foreigners was based a lot around their strange values and untrustrworthiness.
I understand it and I think it is important project.
I will try to write something in next couple of months where I will check another approach: is it possible to describe AI-human positive relations without extracting or extrapolating values at all. For now I have some gut feeling that it could be interesting point of view, but I am not ready to formalize it.
Yes, but this happens only because of the way we define preferences, imho. We define preferences as purely rational part, then compare this definition with actual humans, and see that there is also another irrational part.
Example: the same way we could say: every human being is six feet high, plus minus some noise variable. This may be useful way to describe humans, but it has obvious limitations.
What I suggest to do, is to look why we decided that humans have values or preferences at all? It is idea which appeared somewhere in 20 century psychology or philosophy, and it is only one of several ways to describe humans behaviour.
I want to construct/extract/extrapolate/define human preferences (or make a human reward/utility function), in order to have something we can give AI as a goal. Whether we count this as defining or extrapolating doesn’t really matter; it’s the result that’s important.
One of the things that gives me hope is that actual humans overlap considerably in their judgement of what is rational and irrational. Almost everyone agrees that the anchoring bias is bias, not a preference; almost everyone agrees that people are less rational when drunk (with the caveat that drunkeness can also suppress certain other irrationalities, like social phobia—but again, that more complicated story is also something that people tend to agree on).
And values, and debates over values, date back at least to tribal times; dehumanising foreigners was based a lot around their strange values and untrustrworthiness.
I understand it and I think it is important project.
I will try to write something in next couple of months where I will check another approach: is it possible to describe AI-human positive relations without extracting or extrapolating values at all. For now I have some gut feeling that it could be interesting point of view, but I am not ready to formalize it.
Good luck with that! I’m skeptical of that approach, but it would be lovely if it could be worked out...