Speaking contemplatively rather than rigorously: In theory, couldn’t an AI with a broken or extremely difficult utility function decide to tweak it to a similar but more achievable set of goals?
Something like … its original utility function is “First goal: Ensure that, at noon every day, −1 * −1 = −1. Secondary goal: Promote the welfare of goats.” The AI might struggle with the first (impossible) task for a while, then reluctantly modify its code to delete the first goal and remove itself from the obligation to do pointless work. The AI would be okay with this change because it would produce more total utility under both functions.
Now, i know that one might define ‘utility function’ as a description of the program’s tendencies, rather than as a piece of code … but i have a hunch that something like the above self-modification could happen with some architectures.
Speaking contemplatively rather than rigorously: In theory, couldn’t an AI with a broken or extremely difficult utility function decide to tweak it to a similar but more achievable set of goals?
Something like … its original utility function is “First goal: Ensure that, at noon every day, −1 * −1 = −1. Secondary goal: Promote the welfare of goats.” The AI might struggle with the first (impossible) task for a while, then reluctantly modify its code to delete the first goal and remove itself from the obligation to do pointless work. The AI would be okay with this change because it would produce more total utility under both functions.
Now, i know that one might define ‘utility function’ as a description of the program’s tendencies, rather than as a piece of code … but i have a hunch that something like the above self-modification could happen with some architectures.