There are serious weaknesses even over computable universes. Since Solomonoff induction doesn’t search for ambient dependencies, considering instead explicit dependencies of reward on a particular action-symbol, it counts the same world multiple times if the agent is instantiated multiple times in it and receives the same prefix of the sequence of observations, and considers incorrect counterfactuals in these worlds, CDT-style. So two identical AIXI agents would defect in PD (CDT-style reasoning problem). In Newcomb’s problem, its decision depends on the number of various explicit dependencies that carve the problem as two-boxing or one-boxing, so at least it’s unclear what it’ll do, and there seems to be no reason that it should reliably one-box. Given an anthropic problem where on a toss of a coin in one branch we create very many identical AIXI agents for the purpose of making some decision, while on the other branch there’s only one agent (assuming the branches have sufficiently similar K-complexity), probability (complexity) of the branch with more agents would be double-counted as many times, so the consequences of actions in one-agent world won’t be taken into consideration compared to consequences in the many-agents world. In Counterfactual Mugging, AIXI would not give money to Omega, since it updates. And so on.
Someone should really write up an UDT-AIXI that doesn’t have these problems, I expect it should be straightforward enough (so that the problems like those I listed don’t count against the arguments AIXI is used in, as a definition of impressive capability; make it stronger, not laugh at its flaws).
I’d say ambient dependencies are the least of AIXI’s problems. It has >99% probability of wireheading, and in >99% of the remaining outcomes it disassembles itself with its mining claws. timtyler put the nail in the coffin quite a while ago.
Someone should really write up an UDT-AIXI that doesn’t have these problems
Some days ago I tried to do just that, and stopped because I don’t know enough. Now I’m back to studying more math. The most obvious problem is that AIXI is uncomputable, so a UDT-AIXI will probably also be uncomputable. But how does an uncomputable agent get to find itself embedded in a computable universe?
It has >99% probability of wireheading, and in >99% of the remaining outcomes it disassembles itself with its mining claws.
Wireheading is just what reinforcement learning agents are built to do, so it’s not actually a problem. Hurting own hardware because anticipation admits quantum suicide is partially the same problem as relying on explicit dependencies, although it’s still hard to define reward, to count the worlds that include or not include your instance (with given reward-observations), but this should be solved in any case for UDT-AIXI, and the only way to solve this that I see (which doesn’t involve privileging particular physical implementation of the agent, but accepts it on any substrate within a world-program) again involves looking for ambient dependencies (namely, see if a dependence is present, and count a world-program only if it is).
So these problems are also automatically taken care of in UDT-AIXI, to the extent they are problematic.
Wireheading is just what reinforcement learning agents are built to do, so it’s not actually a problem.
This comment led me to the following tangential train of thought: AIXI seems to capture the essence of reinforcement learning, but does not feel pain or pleasure. I do not feel morally compelled to help an AIXI-like agent (as opposed to a human) gain positive reinforcements and avoid negative reinforcements (unless it was some part of a trade).
After writing the above, I found this old comment of yours, which seems closely related. But thinking about an AIXI-like agent that has only “wants” and no “likes”, I feel myself being pulled towards what you called the “naive view”. Do you have any further thoughts on this subject?
The most obvious problem is that AIXI is uncomputable, so a UDT-AIXI will probably also be uncomputable. But how does an uncomputable agent get to find itself embedded in a computable universe?
You are interpreting AIXI too literally. Think of it as advice to aspiring superintelligences, not a magical uncomputable algorithm. It is normative decision theory, definition of the action that should be taken as a mathematical structure, even though it’s not possible to have an agent that actually takes only the actions that should be taken, to compute all the necessary properties of this mathematical structure.
An UDT-AIXI would say which action a given reinforcement-learning agent should take. It’s all already in the UDT post, just add UDT 1.1 provision that one really shouldn’t update on observations and optimize over strategies instead, stipulate that you use a universal prior over programs, and assign utility according to expected reward, where reward is one of the channels of observation, and is counted over world-programs that “contain” corresponding agent. And then do some AIXI-style math to explore properties of the resulting structure.
What exactly is the optimization problem to which UDT is the solution?
The question is not trivial, because of the way we define UDT. It assumes a prior over possible programs and a utility function over their execution histories. But once you fix these two mathematical structures, there’s nothing left to optimize. Whatever happens, happens. So an answer to the question is bound to involve some new formal tricks. Any ideas to what they may be?
After that I went on to invent just such a formal trick (W/U/A), but it failed to clear things up.
But once you fix these two mathematical structures, there’s nothing left to optimize. Whatever happens, happens.
It’s a free will/epistemology (morality/truth) clash problem, expressed perhaps in agent-provability. What you’ll do is defined by the laws of physics, but you can’t infer what you’ll do by considering the laws of physics, since there are other relevant (moral) considerations that go into deciding what to do. So you can’t really say in the context of discussing decision theory that “whatever happens, happens”. It’s not a relevant consideration in arriving at a decision.
What do you mean? Whatever happens, happens, if you are not deciding. A normative idea of a correct decision can be thought of from the inside, even if it’s generally uncomputable, and so only glimpses of the answer can be extracted from it.
From the outside, counterfactual consequences don’t appear consistent. If the agent actually chooses action A, the idealized UDT-AIXI thingy will see that choosing action B would have given the agent a billion dollars, and choosing C would have given a trillion. Do you see a way around that?
UDT-AIXI could ask which moral arguments the agent would discover if it had more time to think. It won’t of course examine the counterfactuals of a fact known to the context in which the resulting mathematical structure is to be interpreted. You can only use a normative consideration from the inside, so whenever you step outside, you must also shift the decision problem to allow thinking about moral considerations.
Select the worlds whose world history is ambiently controlled by the agent, that is the ambient dependence is non-constant, the conclusion of which world-history is implemented by given world-program depends on which strategy we assume the agent implements. Then read out the utility of reward channel from that strategy in that world.
Hmm… This is problematic if the same world contains multiple agent-instances that received different rewards (by following the same strategy but encountering different observations). What is the utility of such a world? This is a necessary question of specifying the decision problem. Perhaps it is a point where the notion of reinforcement learning breaks.
There are serious weaknesses even over computable universes. Since Solomonoff induction doesn’t search for ambient dependencies, considering instead explicit dependencies of reward on a particular action-symbol, it counts the same world multiple times if the agent is instantiated multiple times in it and receives the same prefix of the sequence of observations, and considers incorrect counterfactuals in these worlds, CDT-style. So two identical AIXI agents would defect in PD (CDT-style reasoning problem). In Newcomb’s problem, its decision depends on the number of various explicit dependencies that carve the problem as two-boxing or one-boxing, so at least it’s unclear what it’ll do, and there seems to be no reason that it should reliably one-box. Given an anthropic problem where on a toss of a coin in one branch we create very many identical AIXI agents for the purpose of making some decision, while on the other branch there’s only one agent (assuming the branches have sufficiently similar K-complexity), probability (complexity) of the branch with more agents would be double-counted as many times, so the consequences of actions in one-agent world won’t be taken into consideration compared to consequences in the many-agents world. In Counterfactual Mugging, AIXI would not give money to Omega, since it updates. And so on.
Someone should really write up an UDT-AIXI that doesn’t have these problems, I expect it should be straightforward enough (so that the problems like those I listed don’t count against the arguments AIXI is used in, as a definition of impressive capability; make it stronger, not laugh at its flaws).
I’d say ambient dependencies are the least of AIXI’s problems. It has >99% probability of wireheading, and in >99% of the remaining outcomes it disassembles itself with its mining claws. timtyler put the nail in the coffin quite a while ago.
Some days ago I tried to do just that, and stopped because I don’t know enough. Now I’m back to studying more math. The most obvious problem is that AIXI is uncomputable, so a UDT-AIXI will probably also be uncomputable. But how does an uncomputable agent get to find itself embedded in a computable universe?
Wireheading is just what reinforcement learning agents are built to do, so it’s not actually a problem. Hurting own hardware because anticipation admits quantum suicide is partially the same problem as relying on explicit dependencies, although it’s still hard to define reward, to count the worlds that include or not include your instance (with given reward-observations), but this should be solved in any case for UDT-AIXI, and the only way to solve this that I see (which doesn’t involve privileging particular physical implementation of the agent, but accepts it on any substrate within a world-program) again involves looking for ambient dependencies (namely, see if a dependence is present, and count a world-program only if it is).
So these problems are also automatically taken care of in UDT-AIXI, to the extent they are problematic.
This comment led me to the following tangential train of thought: AIXI seems to capture the essence of reinforcement learning, but does not feel pain or pleasure. I do not feel morally compelled to help an AIXI-like agent (as opposed to a human) gain positive reinforcements and avoid negative reinforcements (unless it was some part of a trade).
After writing the above, I found this old comment of yours, which seems closely related. But thinking about an AIXI-like agent that has only “wants” and no “likes”, I feel myself being pulled towards what you called the “naive view”. Do you have any further thoughts on this subject?
You are interpreting AIXI too literally. Think of it as advice to aspiring superintelligences, not a magical uncomputable algorithm. It is normative decision theory, definition of the action that should be taken as a mathematical structure, even though it’s not possible to have an agent that actually takes only the actions that should be taken, to compute all the necessary properties of this mathematical structure.
An UDT-AIXI would say which action a given reinforcement-learning agent should take. It’s all already in the UDT post, just add UDT 1.1 provision that one really shouldn’t update on observations and optimize over strategies instead, stipulate that you use a universal prior over programs, and assign utility according to expected reward, where reward is one of the channels of observation, and is counted over world-programs that “contain” corresponding agent. And then do some AIXI-style math to explore properties of the resulting structure.
My email from Nov 15 may be relevant here:
After that I went on to invent just such a formal trick (W/U/A), but it failed to clear things up.
It’s a free will/epistemology (morality/truth) clash problem, expressed perhaps in agent-provability. What you’ll do is defined by the laws of physics, but you can’t infer what you’ll do by considering the laws of physics, since there are other relevant (moral) considerations that go into deciding what to do. So you can’t really say in the context of discussing decision theory that “whatever happens, happens”. It’s not a relevant consideration in arriving at a decision.
But it seems to be a relevant consideration when looking at the situation “from the outside” like your proposed UDT-AIXI does, right?
What do you mean? Whatever happens, happens, if you are not deciding. A normative idea of a correct decision can be thought of from the inside, even if it’s generally uncomputable, and so only glimpses of the answer can be extracted from it.
From the outside, counterfactual consequences don’t appear consistent. If the agent actually chooses action A, the idealized UDT-AIXI thingy will see that choosing action B would have given the agent a billion dollars, and choosing C would have given a trillion. Do you see a way around that?
UDT-AIXI could ask which moral arguments the agent would discover if it had more time to think. It won’t of course examine the counterfactuals of a fact known to the context in which the resulting mathematical structure is to be interpreted. You can only use a normative consideration from the inside, so whenever you step outside, you must also shift the decision problem to allow thinking about moral considerations.
How do you formalize this? I couldn’t figure it out when I tried this.
Select the worlds whose world history is ambiently controlled by the agent, that is the ambient dependence is non-constant, the conclusion of which world-history is implemented by given world-program depends on which strategy we assume the agent implements. Then read out the utility of reward channel from that strategy in that world.
Hmm… This is problematic if the same world contains multiple agent-instances that received different rewards (by following the same strategy but encountering different observations). What is the utility of such a world? This is a necessary question of specifying the decision problem. Perhaps it is a point where the notion of reinforcement learning breaks.