What are other examples of possible motivating beliefs? I find the examples of morals incredibly non-convincing (as in actively convincing me of the opposite position).
Here’s a few examples I think might count. They aren’t universal, but they do affect humans:
Realizing neg-entropy is going to run out and the universe will end. An agent trying to maximize average-utility-over-time might treat this as a proof that the average is independent of its actions, so that it assigns a constant eventual average utility to all possible actions (meaning what it does from then on is decided more by quirks in the maximization code, like doing whichever hypothesized action was generated first or last).
Discovering more fundamental laws of physics. Imagine an AI was programmed and set off in the 1800s, before anyone knew about quantum physics. The AI promptly discovers quantum physics, and then...? There was no rule given for how to maximize utility in the face of branching world lines or collapse-upon-measurement. Again the outcome might come down to quirks in the code; on how the mapping between the classical utilities and quantum realities is done (e.g. if the AI is risk-averse then its actions could differ based on if was using Copenhagen or Many-worlds).
Learning you’re not consistent and complete. An agent built with an axiom that it is consistent and complete, and the ability to do proof by contradiction, could basically trash its mathematical knowledge by proving all things when it finds the halting problem / incompleteness theorems.
Discovering an opponent that is more powerful than you. For example, if an AI proved that Yahweh, god of the old testament, actually existed then it might stop mass-producing paperclips and start mass-producing sacrificial goats or prayers for paperclips.
Good question. Some of these seem to me like a change in instrumental goals only. If you meant to include such things, then there are very many examples—e.g. if I learn I am out of milk then my instrumental goal of opening the fridge is undermined.
What are other examples of possible motivating beliefs? I find the examples of morals incredibly non-convincing (as in actively convincing me of the opposite position).
Here’s a few examples I think might count. They aren’t universal, but they do affect humans:
Realizing neg-entropy is going to run out and the universe will end. An agent trying to maximize average-utility-over-time might treat this as a proof that the average is independent of its actions, so that it assigns a constant eventual average utility to all possible actions (meaning what it does from then on is decided more by quirks in the maximization code, like doing whichever hypothesized action was generated first or last).
Discovering more fundamental laws of physics. Imagine an AI was programmed and set off in the 1800s, before anyone knew about quantum physics. The AI promptly discovers quantum physics, and then...? There was no rule given for how to maximize utility in the face of branching world lines or collapse-upon-measurement. Again the outcome might come down to quirks in the code; on how the mapping between the classical utilities and quantum realities is done (e.g. if the AI is risk-averse then its actions could differ based on if was using Copenhagen or Many-worlds).
Learning you’re not consistent and complete. An agent built with an axiom that it is consistent and complete, and the ability to do proof by contradiction, could basically trash its mathematical knowledge by proving all things when it finds the halting problem / incompleteness theorems.
Discovering an opponent that is more powerful than you. For example, if an AI proved that Yahweh, god of the old testament, actually existed then it might stop mass-producing paperclips and start mass-producing sacrificial goats or prayers for paperclips.
Good question. Some of these seem to me like a change in instrumental goals only. If you meant to include such things, then there are very many examples—e.g. if I learn I am out of milk then my instrumental goal of opening the fridge is undermined.