This can evidently deal with probabilistic settings; the →o would encode an expected value which the agent is trying to maximize, with the obvious preference ordering. It would get unrealistic for very complicated situations, since expected values are very often intractable and must be approximated. Given that we know the exact probabilistic model, this is a sort of logical uncertainty.
How might this be modified to deal with logical uncertainty?
A simple idea that comes to mind is to modify the procedure to look for lower bounds on expected values, rather than trying to prove exact expected values. The procedure of choosing the best & breaking ties remains unmodified. (I’m not really trying to think about the modalized form at the moment.)
This should work for a lot more cases, but still has a feeling of being potentially unrealistic. It doesn’t take advantage of any approaches to logical uncertainty that have been discussed.
One really nice thing about the approach is that we do not have to identify the agent within the universe. All that is needed is the logical implications of supposing that the agent takes various actions. As a result, the agent will treat programs provably equivalent to its own as copies of itself with no fuss.
We can take a prior on all programs as our universe, with no special interface between the agent and the program as in AIXI. (We do need to somehow specify utilities on programs, which is a non-obvious procedure.) The agent then searches for proofs that if it takes some particular action, then at least some particular expected utility is achieved. There are no longer finitely many possible outcomes, so modal tricks to do this without a halting oracle won’t work. Instead, the agent must be doing this for some bounded time. It then takes the action with highest proven utility.
This can evidently deal with probabilistic settings; the →o would encode an expected value which the agent is trying to maximize, with the obvious preference ordering. It would get unrealistic for very complicated situations, since expected values are very often intractable and must be approximated. Given that we know the exact probabilistic model, this is a sort of logical uncertainty.
How might this be modified to deal with logical uncertainty?
A simple idea that comes to mind is to modify the procedure to look for lower bounds on expected values, rather than trying to prove exact expected values. The procedure of choosing the best & breaking ties remains unmodified. (I’m not really trying to think about the modalized form at the moment.)
This should work for a lot more cases, but still has a feeling of being potentially unrealistic. It doesn’t take advantage of any approaches to logical uncertainty that have been discussed.
One really nice thing about the approach is that we do not have to identify the agent within the universe. All that is needed is the logical implications of supposing that the agent takes various actions. As a result, the agent will treat programs provably equivalent to its own as copies of itself with no fuss.
We can take a prior on all programs as our universe, with no special interface between the agent and the program as in AIXI. (We do need to somehow specify utilities on programs, which is a non-obvious procedure.) The agent then searches for proofs that if it takes some particular action, then at least some particular expected utility is achieved. There are no longer finitely many possible outcomes, so modal tricks to do this without a halting oracle won’t work. Instead, the agent must be doing this for some bounded time. It then takes the action with highest proven utility.