I’ve finally been able to put words to some things I’ve been pondering for awhile, and a Google search on the most sensible terms (to me) for these things turned up nothing. Looking to see if there’s already a body of writing on these topics using different terms, and my ignorance of such would lead to me just re-inventing the wheel in my ponderings. If these are NOT discussed topics for some reason, I’ll post my thoughts because I think they could be critically important to the development of Friendly AI.
implicit utility function (‘survive’ is an implicit utility function because regardless of what your explicit utility function is, you can’t progress it if you’re dead)
conflicted utility function (a utility function that requires your death for optimal value is conflicted, as in the famous Pig That Wants to be Eaten)
dynamic utility function (a static utility function is a major effectiveness handicap, probably a fatal one on a long enough time scale)
meta utility function (a utility function that takes the existence of itself into account)
What you label “implicit utility function” sounds like instrumental goals to me. Some of that is also covered under Basic AI Drives.
I’m not familiar with the pig that wants to be eaten, but I’m not sure I would describe that as a conflicted utility function. If one has a utility function that places maximum utility on an outcome that requires their death, then there is no conflict, that is the optimal choice. Though I think human’s who think they have such a utility function are usually mistaken, but that is a much more involved discussion.
Not sure what the point of a dynamic utility function is. Your values really shouldn’t change. I feel like you may be focused on instrumental goals that can and should change and thinking those are part of the utility function when they are not.
I’ve finally been able to put words to some things I’ve been pondering for awhile, and a Google search on the most sensible terms (to me) for these things turned up nothing. Looking to see if there’s already a body of writing on these topics using different terms, and my ignorance of such would lead to me just re-inventing the wheel in my ponderings. If these are NOT discussed topics for some reason, I’ll post my thoughts because I think they could be critically important to the development of Friendly AI.
implicit utility function (‘survive’ is an implicit utility function because regardless of what your explicit utility function is, you can’t progress it if you’re dead)
conflicted utility function (a utility function that requires your death for optimal value is conflicted, as in the famous Pig That Wants to be Eaten)
dynamic utility function (a static utility function is a major effectiveness handicap, probably a fatal one on a long enough time scale)
meta utility function (a utility function that takes the existence of itself into account)
What you label “implicit utility function” sounds like instrumental goals to me. Some of that is also covered under Basic AI Drives.
I’m not familiar with the pig that wants to be eaten, but I’m not sure I would describe that as a conflicted utility function. If one has a utility function that places maximum utility on an outcome that requires their death, then there is no conflict, that is the optimal choice. Though I think human’s who think they have such a utility function are usually mistaken, but that is a much more involved discussion.
Not sure what the point of a dynamic utility function is. Your values really shouldn’t change. I feel like you may be focused on instrumental goals that can and should change and thinking those are part of the utility function when they are not.