Questions like “what would this human do in a situation where there is a cat in a room” has a unique answer that reflects reality, as if that kidn of situation was ran then something would need to happen.
Sure if we start from high abstract values and then try to make them more concrete we might lose the way. If we can turn philosophies into feelings but do not know how to turn feelings into chemistry then there is a level of representation that might not be sufficient. But we know there is one level that is sufficient to describe action and that all the levels are somehow (maybe in an unknown way) connected (mostly stacked on top). So this incompatibility of representation can not be fundamental. Because if it was, then there would be a gap between the levels and the thing would not be connected anymore.
So there is no question “presented with this stimuli how would the human react?” that would be in principle unanswerable. If preferences are expressed as responces to choice situations this is a subcategory of reaction. Even if preferences are expressed as responces to philosophy prompts they would be a subcategory.
One could say that it is not super clarifying that if a two human system represented with philosophical stimuli of “Is candy worth 4$?” you get one human that says “yes” and another human that says “no”. But this is just a swiggle in the function. The function is being really inconvenient when you can’t use an approximation where you can think of just one “average human” and then all humans would reflect that very closely. But we are not promised that the function is a function of time of day or function of verbal short term memory or function of television broadcast data.
Maybe you are saying something like “genetic fitness doesn’t exist” because some animals are fit when they are small and some animals are fit when they are large, so there is no consistent account whether smallness is good or not. Then “human utility function doesn’t exist” because human A over here dares to have different opinions and strategies than human B over here and they do not end up mimicing each other. But like an animal lives or dies, a human will zig or zag. And it can not be that the zigging would fail to be a function of worldstate (with some QM assumed away to be non-significant (and even then maybe not)). What it can be is fail to be function of the world state as we understand it, or our computer system models it, or can be captured in the variables we are using. But then the question is whether we can make do with just these variables and not that there would be nothing to model.
In this language it could be rephrased:
If you think you have a good wide set of variables to come up with any needed solution function, you don’t. You have too few variables.
But the “function” in this sense is how the computer system models reality (or like attitudial modes it can take towards reality). But part of how we know that the setup is inadequate is that there is an entity outside of the system that is not reflected in it. Aka, this system can only zig or zag when we needed zog which it can not do. The thing that will keep on missing is the way that reality actually dances. Maybe in some small bubbles we can actually have totally capturing representations in the senses that we care. But there is a fact of the matter to the inquiry. For any sense we might care there is a slice of the whole thing that is sufficient for that. To express zog you need these features, to express zeg you need these other ones.
Human will is quite complex so we can reasonably expect to be spending quite a lot of time in undermodelling. But that is a very different thing from being unmodellable.
Questions like “what would this human do in a situation where there is a cat in a room” has a unique answer that reflects reality, as if that kidn of situation was ran then something would need to happen.
It’s not about what the human would do in a given situation. It’s about values – not everything we do reflects our values. Eating meat when you’d rather be vegetarian, smoking when you’d rather not, etc. How do you distinguish biases from fundamental intuitions? How do you infer values from mere observations of behavior? There are a bunch of problems described in this sequence. Not to mention stuff I discuss here about how values may remain under-defined even if we specify a suitable reflection procedure and have people undergo that procedure.
Ineffective values do not need to be considered for a utility function as they do not effect what gets strived for. If you say “I will choose B” and still choose A you are still choosing A. You are not required to be aware of your utility function.
That is a lot of material to go throught en masse, so I will need some sharper pointers of relevance to actually engage.
Ineffective values do not need to be considered for a utility function as they do not effect what gets strived for. If you say “I will choose B” and still choose A you are still choosing A. You are not required to be aware of your utility function.
Uff, a future where humans get more of what they’re striving for but without adjusting for biases and ineffectual values? Why would you care about saving our species, then?
It sounds like people are using “utility function” in different ways in this thread.
I do think that there is a lot of confusion and definitional ground work would probably bear fruit.
If one is trying to “save” some fictious homo economicus that significantly differs from human, that is not really humans.
A world view where humans-as-is is too broken to bother salvaging is rather bleak. I see that the transition away from biases can be modelled has having a utility function with biases and then describing a utility function “without biases” the “how the behaviour should be” and arguing what kind of tweaks we need to make into the gears so that we get from the first white box to the target white box. Part of this is getting the “broken state of humans” to be modelled accurately. If we can get a computer to follow that we would hit aligned exactly-medium-AI. Then we can ramp up the virtuosity of the behaviour (by providing a more laudable utility function).
There seems to be an approach where we just describe the “ideal behaviour utility function” and try to get the computers to do that. Without any of the humans having the capability to know or to follow such a utility function. First make it laudable and then make it reminiscent of humans (hopefully making it human approvable).
The exactly-medium-AI function is not problematically ambigious. “Ideal reasoning behaviour” is under significant and hard-to-reconcile difference of opinion. “Human utility function” refers to exactly-medium-AI but only run on carbon.
I would benefit and appriciate if anyone bothers to fish out conflicting or inconsistent use of the concept.
Questions like “what would this human do in a situation where there is a cat in a room” has a unique answer that reflects reality, as if that kidn of situation was ran then something would need to happen.
Sure if we start from high abstract values and then try to make them more concrete we might lose the way. If we can turn philosophies into feelings but do not know how to turn feelings into chemistry then there is a level of representation that might not be sufficient. But we know there is one level that is sufficient to describe action and that all the levels are somehow (maybe in an unknown way) connected (mostly stacked on top). So this incompatibility of representation can not be fundamental. Because if it was, then there would be a gap between the levels and the thing would not be connected anymore.
So there is no question “presented with this stimuli how would the human react?” that would be in principle unanswerable. If preferences are expressed as responces to choice situations this is a subcategory of reaction. Even if preferences are expressed as responces to philosophy prompts they would be a subcategory.
One could say that it is not super clarifying that if a two human system represented with philosophical stimuli of “Is candy worth 4$?” you get one human that says “yes” and another human that says “no”. But this is just a swiggle in the function. The function is being really inconvenient when you can’t use an approximation where you can think of just one “average human” and then all humans would reflect that very closely. But we are not promised that the function is a function of time of day or function of verbal short term memory or function of television broadcast data.
Maybe you are saying something like “genetic fitness doesn’t exist” because some animals are fit when they are small and some animals are fit when they are large, so there is no consistent account whether smallness is good or not. Then “human utility function doesn’t exist” because human A over here dares to have different opinions and strategies than human B over here and they do not end up mimicing each other. But like an animal lives or dies, a human will zig or zag. And it can not be that the zigging would fail to be a function of worldstate (with some QM assumed away to be non-significant (and even then maybe not)). What it can be is fail to be function of the world state as we understand it, or our computer system models it, or can be captured in the variables we are using. But then the question is whether we can make do with just these variables and not that there would be nothing to model.
In this language it could be rephrased:
But the “function” in this sense is how the computer system models reality (or like attitudial modes it can take towards reality). But part of how we know that the setup is inadequate is that there is an entity outside of the system that is not reflected in it. Aka, this system can only zig or zag when we needed zog which it can not do. The thing that will keep on missing is the way that reality actually dances. Maybe in some small bubbles we can actually have totally capturing representations in the senses that we care. But there is a fact of the matter to the inquiry. For any sense we might care there is a slice of the whole thing that is sufficient for that. To express zog you need these features, to express zeg you need these other ones.
Human will is quite complex so we can reasonably expect to be spending quite a lot of time in undermodelling. But that is a very different thing from being unmodellable.
It’s not about what the human would do in a given situation. It’s about values – not everything we do reflects our values. Eating meat when you’d rather be vegetarian, smoking when you’d rather not, etc. How do you distinguish biases from fundamental intuitions? How do you infer values from mere observations of behavior? There are a bunch of problems described in this sequence. Not to mention stuff I discuss here about how values may remain under-defined even if we specify a suitable reflection procedure and have people undergo that procedure.
Ineffective values do not need to be considered for a utility function as they do not effect what gets strived for. If you say “I will choose B” and still choose A you are still choosing A. You are not required to be aware of your utility function.
That is a lot of material to go throught en masse, so I will need some sharper pointers of relevance to actually engage.
Uff, a future where humans get more of what they’re striving for but without adjusting for biases and ineffectual values? Why would you care about saving our species, then?
It sounds like people are using “utility function” in different ways in this thread.
I do think that there is a lot of confusion and definitional ground work would probably bear fruit.
If one is trying to “save” some fictious homo economicus that significantly differs from human, that is not really humans.
A world view where humans-as-is is too broken to bother salvaging is rather bleak. I see that the transition away from biases can be modelled has having a utility function with biases and then describing a utility function “without biases” the “how the behaviour should be” and arguing what kind of tweaks we need to make into the gears so that we get from the first white box to the target white box. Part of this is getting the “broken state of humans” to be modelled accurately. If we can get a computer to follow that we would hit aligned exactly-medium-AI. Then we can ramp up the virtuosity of the behaviour (by providing a more laudable utility function).
There seems to be an approach where we just describe the “ideal behaviour utility function” and try to get the computers to do that. Without any of the humans having the capability to know or to follow such a utility function. First make it laudable and then make it reminiscent of humans (hopefully making it human approvable).
The exactly-medium-AI function is not problematically ambigious. “Ideal reasoning behaviour” is under significant and hard-to-reconcile difference of opinion. “Human utility function” refers to exactly-medium-AI but only run on carbon.
I would benefit and appriciate if anyone bothers to fish out conflicting or inconsistent use of the concept.