Now that I think about it, it’s a pretty big PR problem if I have to start every explanation of my value learning scheme with “humans don’t have actual preferences so the AI is just going to try to learn something adequate.” Maybe I should figure out a system of jargon such that I can say, in jargon, that the AI is learning peoples’ actual preferences, and it will correspond to what laypeople actually want from value learning.
I’m not sure whether such jargon would make actual technical thinking harder, though.
“humans don’t have actual preferences so the AI is just going to try to learn something adequate.”
Try something like: humans don’t have actual consistent preferences, so the AI is going to try and find a good approximation that covers all the contradictions and uncertainties in human preferences.
One of the reasons I refer to synthesising (or constructing) the UH, not learning it.
Now that I think about it, it’s a pretty big PR problem if I have to start every explanation of my value learning scheme with “humans don’t have actual preferences so the AI is just going to try to learn something adequate.” Maybe I should figure out a system of jargon such that I can say, in jargon, that the AI is learning peoples’ actual preferences, and it will correspond to what laypeople actually want from value learning.
I’m not sure whether such jargon would make actual technical thinking harder, though.
Try something like: humans don’t have actual consistent preferences, so the AI is going to try and find a good approximation that covers all the contradictions and uncertainties in human preferences.