Ineffective values do not need to be considered for a utility function as they do not effect what gets strived for. If you say “I will choose B” and still choose A you are still choosing A. You are not required to be aware of your utility function.
Uff, a future where humans get more of what they’re striving for but without adjusting for biases and ineffectual values? Why would you care about saving our species, then?
It sounds like people are using “utility function” in different ways in this thread.
I do think that there is a lot of confusion and definitional ground work would probably bear fruit.
If one is trying to “save” some fictious homo economicus that significantly differs from human, that is not really humans.
A world view where humans-as-is is too broken to bother salvaging is rather bleak. I see that the transition away from biases can be modelled has having a utility function with biases and then describing a utility function “without biases” the “how the behaviour should be” and arguing what kind of tweaks we need to make into the gears so that we get from the first white box to the target white box. Part of this is getting the “broken state of humans” to be modelled accurately. If we can get a computer to follow that we would hit aligned exactly-medium-AI. Then we can ramp up the virtuosity of the behaviour (by providing a more laudable utility function).
There seems to be an approach where we just describe the “ideal behaviour utility function” and try to get the computers to do that. Without any of the humans having the capability to know or to follow such a utility function. First make it laudable and then make it reminiscent of humans (hopefully making it human approvable).
The exactly-medium-AI function is not problematically ambigious. “Ideal reasoning behaviour” is under significant and hard-to-reconcile difference of opinion. “Human utility function” refers to exactly-medium-AI but only run on carbon.
I would benefit and appriciate if anyone bothers to fish out conflicting or inconsistent use of the concept.
Uff, a future where humans get more of what they’re striving for but without adjusting for biases and ineffectual values? Why would you care about saving our species, then?
It sounds like people are using “utility function” in different ways in this thread.
I do think that there is a lot of confusion and definitional ground work would probably bear fruit.
If one is trying to “save” some fictious homo economicus that significantly differs from human, that is not really humans.
A world view where humans-as-is is too broken to bother salvaging is rather bleak. I see that the transition away from biases can be modelled has having a utility function with biases and then describing a utility function “without biases” the “how the behaviour should be” and arguing what kind of tweaks we need to make into the gears so that we get from the first white box to the target white box. Part of this is getting the “broken state of humans” to be modelled accurately. If we can get a computer to follow that we would hit aligned exactly-medium-AI. Then we can ramp up the virtuosity of the behaviour (by providing a more laudable utility function).
There seems to be an approach where we just describe the “ideal behaviour utility function” and try to get the computers to do that. Without any of the humans having the capability to know or to follow such a utility function. First make it laudable and then make it reminiscent of humans (hopefully making it human approvable).
The exactly-medium-AI function is not problematically ambigious. “Ideal reasoning behaviour” is under significant and hard-to-reconcile difference of opinion. “Human utility function” refers to exactly-medium-AI but only run on carbon.
I would benefit and appriciate if anyone bothers to fish out conflicting or inconsistent use of the concept.