Please don’t use “utility function” in this context.
I probably blatantly reveal my ignorance by asking this, but do only agents who know what they want have a utility-function? An AGI undergoing recursive self-improvement can’t possible know what exactly it is going to “want” later on (some (sub)goals may turn out to be impossible while world states previously believed to be impossible might turn out to be possible), yet it is implied by its given utility-function and the “nature of reality” (environmental circumstances).
What you believe you want is different from what you actually want or what you should want of what you would like if it happened, or what you should want to happen irrespective of your own experience...
You believe that what you want is actually different from what you want. You appear to be knowing that what you believe you want is different from what you actually want. Proof by contradiction that what you believe you want is what you actually want?
Your utility-function seems to assign high utility to world states where it is optimized according to new information. In other words, you believe that your utility-function should be undergoing recursive self-improvement.
I think Nesov’s saying that you have a utility function, but you don’t explicitly know it to the degree that you can make statements about its content. Or at least, it would be more accurate to use the best colloquial term, and leave the term of art “utility function” to its technical meaning.
Also, your penultimate paragraph sounds confused, while the paragraph it’s responding to is confusing but coherent. Nesov’s explicitly listing a variety of related but different categories that “utility function” gets misinterpreted into. He doesn’t claim to believe that what he wants is different from what he wants.
I probably blatantly reveal my ignorance by asking this, but do only agents who know what they want have a utility-function? An AGI undergoing recursive self-improvement can’t possible know what exactly it is going to “want” later on (some (sub)goals may turn out to be impossible while world states previously believed to be impossible might turn out to be possible), yet it is implied by its given utility-function and the “nature of reality” (environmental circumstances).
You believe that what you want is actually different from what you want. You appear to be knowing that what you believe you want is different from what you actually want. Proof by contradiction that what you believe you want is what you actually want?
Your utility-function seems to assign high utility to world states where it is optimized according to new information. In other words, you believe that your utility-function should be undergoing recursive self-improvement.
I think Nesov’s saying that you have a utility function, but you don’t explicitly know it to the degree that you can make statements about its content. Or at least, it would be more accurate to use the best colloquial term, and leave the term of art “utility function” to its technical meaning.
Also, your penultimate paragraph sounds confused, while the paragraph it’s responding to is confusing but coherent. Nesov’s explicitly listing a variety of related but different categories that “utility function” gets misinterpreted into. He doesn’t claim to believe that what he wants is different from what he wants.
Nope—in theory, all agents have a utility-function—though it might not necessarily be the neatest way of expressing what they value.