What if instead of assuming your Xs, you got them out of the program?
To simplify, imagine a universe with only one fact. A function fact() returns 1 if that fact is true, and 0 if its not.
Now, agent can prove statements of the form “fact()==1 and agent()==A implies world()==U”. This avoids the problem of plugging in false, agent doesn’t start from assuming X as a statement of its own from which it can derive things, it looks at all the possible outputs of fact(), then sees how that combines with its action to produce utility.
Trivially, if agent has access to the code of fact(), then it can just figure out what that result would be, and it would know what actions correspond to what utilities.
Otherwise, agent() could use a prior distribution or magically infer some probability distribution of if fact is true or not, then choose its action based on normal expected utility calculations.
Note: The above departs from the structure of the world() function given here in that it assumes some way of interacting with world (or its source code) to find out fact() without actually getting the code of fact. Perhaps world is called multiple times, and agent() can observe the utility it accumulates and use that (combined with its “agent()==A and fact()==1 implies world() == U” proofs) to figure out if fact is true or not?
Interestingly, while this allows agent to apply “could” to the world (“It could rain tomorrow”), it doesn’t allow the agent to apply “could” to things that don’t effect world’s utility calculation (“People could have nonmaterial souls that can be removed without influencing their behavior in any way”).
What if instead of assuming your Xs, you got them out of the program?
To simplify, imagine a universe with only one fact. A function fact() returns 1 if that fact is true, and 0 if its not.
Now, agent can prove statements of the form “fact()==1 and agent()==A implies world()==U”. This avoids the problem of plugging in false, agent doesn’t start from assuming X as a statement of its own from which it can derive things, it looks at all the possible outputs of fact(), then sees how that combines with its action to produce utility.
Trivially, if agent has access to the code of fact(), then it can just figure out what that result would be, and it would know what actions correspond to what utilities.
Otherwise, agent() could use a prior distribution or magically infer some probability distribution of if fact is true or not, then choose its action based on normal expected utility calculations.
Note: The above departs from the structure of the world() function given here in that it assumes some way of interacting with world (or its source code) to find out fact() without actually getting the code of fact. Perhaps world is called multiple times, and agent() can observe the utility it accumulates and use that (combined with its “agent()==A and fact()==1 implies world() == U” proofs) to figure out if fact is true or not?
Interestingly, while this allows agent to apply “could” to the world (“It could rain tomorrow”), it doesn’t allow the agent to apply “could” to things that don’t effect world’s utility calculation (“People could have nonmaterial souls that can be removed without influencing their behavior in any way”).