I want to construct/extract/extrapolate/define human preferences (or make a human reward/utility function), in order to have something we can give AI as a goal. Whether we count this as defining or extrapolating doesn’t really matter; it’s the result that’s important.
One of the things that gives me hope is that actual humans overlap considerably in their judgement of what is rational and irrational. Almost everyone agrees that the anchoring bias is bias, not a preference; almost everyone agrees that people are less rational when drunk (with the caveat that drunkeness can also suppress certain other irrationalities, like social phobia—but again, that more complicated story is also something that people tend to agree on).
And values, and debates over values, date back at least to tribal times; dehumanising foreigners was based a lot around their strange values and untrustrworthiness.
I understand it and I think it is important project.
I will try to write something in next couple of months where I will check another approach: is it possible to describe AI-human positive relations without extracting or extrapolating values at all. For now I have some gut feeling that it could be interesting point of view, but I am not ready to formalize it.
I want to construct/extract/extrapolate/define human preferences (or make a human reward/utility function), in order to have something we can give AI as a goal. Whether we count this as defining or extrapolating doesn’t really matter; it’s the result that’s important.
One of the things that gives me hope is that actual humans overlap considerably in their judgement of what is rational and irrational. Almost everyone agrees that the anchoring bias is bias, not a preference; almost everyone agrees that people are less rational when drunk (with the caveat that drunkeness can also suppress certain other irrationalities, like social phobia—but again, that more complicated story is also something that people tend to agree on).
And values, and debates over values, date back at least to tribal times; dehumanising foreigners was based a lot around their strange values and untrustrworthiness.
I understand it and I think it is important project.
I will try to write something in next couple of months where I will check another approach: is it possible to describe AI-human positive relations without extracting or extrapolating values at all. For now I have some gut feeling that it could be interesting point of view, but I am not ready to formalize it.
Good luck with that! I’m skeptical of that approach, but it would be lovely if it could be worked out...