Thanks! For the M1 vs M2, I agree these could reach different outcomes—but would either one be dramatically wrong? There are many “free variables” in the process, aiming to be ok.
I’ll work on learning partial preferences.
“Just brink me tee, without killing my cat and tilling universe with teapots.” [...] and underfdefined – at least based on my self-observation. Thus again I would prefer collectively codified human norm (laws) over extrapolated model of my utility function.
It might be underdefined in some sort of general sense—I understand the feeling, I sometimes get it too. But in practice, it seems like it should ground out to “obey human orders about tea, or do something that is strongly preferred to that by the human”. Humans like their orders being obeyed, and presumably like getting what they’re ordering for; so to disobey that, you’d need to be very sure that there’s a clearly better option for the human.
Of course, it might end up having a sexy server serve pleasantly drugged tea ^_^
One more thing: you model assumes that mental models of situations are actually preexisting. However, imagine a preference between tea and coffee. Before I was asked, I don’t have any model and don’t have any preference. So I will generate some random model, like large coffee and small tea, and when make a choice. However, the mental model I generate depends on framing of the question.
In some sense, here we are passing the buck of complexity from “values” to “mental models”, which are assumed to be stable and actually existing entities. However, we still don’t know what is a separate “mental model”, where it is located in the brain, how it is actually encoded in neurons.
The human might have some taste preferences that will determine between tea and coffee, general hedonism preferences that might also work, and meta-preferences about how they should deal with future choices.
Part of the research agenda—“grounding symbols”—about trying to determine where these models are located.
Thanks! For the M1 vs M2, I agree these could reach different outcomes—but would either one be dramatically wrong? There are many “free variables” in the process, aiming to be ok.
I’ll work on learning partial preferences.
It might be underdefined in some sort of general sense—I understand the feeling, I sometimes get it too. But in practice, it seems like it should ground out to “obey human orders about tea, or do something that is strongly preferred to that by the human”. Humans like their orders being obeyed, and presumably like getting what they’re ordering for; so to disobey that, you’d need to be very sure that there’s a clearly better option for the human.
Of course, it might end up having a sexy server serve pleasantly drugged tea ^_^
One more thing: you model assumes that mental models of situations are actually preexisting. However, imagine a preference between tea and coffee. Before I was asked, I don’t have any model and don’t have any preference. So I will generate some random model, like large coffee and small tea, and when make a choice. However, the mental model I generate depends on framing of the question.
In some sense, here we are passing the buck of complexity from “values” to “mental models”, which are assumed to be stable and actually existing entities. However, we still don’t know what is a separate “mental model”, where it is located in the brain, how it is actually encoded in neurons.
The human might have some taste preferences that will determine between tea and coffee, general hedonism preferences that might also work, and meta-preferences about how they should deal with future choices.
Part of the research agenda—“grounding symbols”—about trying to determine where these models are located.