Thanks for the pointer to the paper, saved for later! I think this task of crafting machine-readable representations of human values is a thorny step in any CEV-like/value-loading proposal which doesn’t involve the AI inferring them itself IRL-style.
I was considering sifting through literature to form a model of ways people tried to do this in an abstract sense. Like, some approaches aim at a fixed normative framework. Others involve an uncertain seed which is collapsed to a likely framework. Others involve extrapolating from fixed to an uncertain distribution of possible places an initial framework drifted towards. Does this happen to ring a bell about any other references?
Thanks for the pointer to the paper, saved for later! I think this task of crafting machine-readable representations of human values is a thorny step in any CEV-like/value-loading proposal which doesn’t involve the AI inferring them itself IRL-style.
I was considering sifting through literature to form a model of ways people tried to do this in an abstract sense. Like, some approaches aim at a fixed normative framework. Others involve an uncertain seed which is collapsed to a likely framework. Others involve extrapolating from fixed to an uncertain distribution of possible places an initial framework drifted towards. Does this happen to ring a bell about any other references?