In particular, I think there’s a distinction between agents with objective functions over states the world vs. their own subjective experience vs. their output.
This line of thinking seems to me very important!
The following point might be obvious to Evan, but it’s probably not obvious to everyone: Objective functions over the agent’s output should probably not be interpreted as objective functions over the physical representation of the output (e.g. the configuration of atoms in certain RAM memory cells). That would just be a special case of objective functions over world states. Rather, we should probably be thinking about objective functions over the output as it is formally defined by the code of the agent (when interpreting “the code of the agent” as a mathematical object, like a Turing machine, and using a formal mapping from code to output).
Perhaps the following analogy can convey this idea: think about a human facing Newcomb’s problem. The person has the following instrumental goal: “be a person that does not take both boxes” (because that makes Omega put $1,000,000 in the first box). Now imagine that that was the person’s terminal goal rather than an instrumental goal. That person might be analogous to a program that “wants” to be a program that its (formally defined) output maximizes a given utility function.
This line of thinking seems to me very important!
The following point might be obvious to Evan, but it’s probably not obvious to everyone: Objective functions over the agent’s output should probably not be interpreted as objective functions over the physical representation of the output (e.g. the configuration of atoms in certain RAM memory cells). That would just be a special case of objective functions over world states. Rather, we should probably be thinking about objective functions over the output as it is formally defined by the code of the agent (when interpreting “the code of the agent” as a mathematical object, like a Turing machine, and using a formal mapping from code to output).
Perhaps the following analogy can convey this idea: think about a human facing Newcomb’s problem. The person has the following instrumental goal: “be a person that does not take both boxes” (because that makes Omega put $1,000,000 in the first box). Now imagine that that was the person’s terminal goal rather than an instrumental goal. That person might be analogous to a program that “wants” to be a program that its (formally defined) output maximizes a given utility function.