Let me back up from my other response. It just occurred to me that UDT1.1 (with a proof system instead of “math intuition”) already constitutes a quining solution to AI reflection.
Consider an AI facing the choice of either creating a copy of itself, which will then go out into the world, or doing nothing. Unfortunately, due to Lobian problems it can’t prove that a copy of itself won’t do something worse than nothing. But UDT1.1 can be thought of as optimizing over an input/output mapping that is implemented by all of its copies. For each possible mapping, it proves a utility value starting from the assumption that it implements that map (which implies that its copies and provably equivalent variants also implement that map). So it doesn’t need to prove (from scratch) that its copy won’t do something worse than nothing.
I think there’d be a wide variety of systems where, so long as the “parent” agent knows the exact strategy that its child will deploy in all relevant situations at “compile time”, the parent will trust the child. The point of the Lob problem is that it arises when we want the parent to trust the child generally, without knowing exactly what the child will do. For the parent to precompute the child’s exact actions implies that the child can’t be smarter than the parent, so it’s not the kind of situation we would encounter when e.g. Agent A wants to build Agent B which has more RAM and faster CPUs than Agent A while still sharing Agent A’s goals. This, of course, is the kind of “agents building agents” scenario that I am most interested in.
Consider a resource-constrained variant of the original game:
Each program receives as input the round number n and the next program, encrypted by repeated xoring with the output of a monotonic computable function f(n). Let T_f(n) be the runtime of the fastest algorithm that computes f. Note that T_f(n) is monotonically increasing.
At round n, the current program has a time limit T(n) = C_0 + C_1 * T_f(n). Quirrell never submits programs that exceed the time limit. In this variant of the game, you have to submit the first program, which has to obey a time limit T(0).
The initial program will not be able to compute the relevant strategy of any of its successors (except at most finitely many of them). And yet, my quining solution still works (just add a decryption step), and I think Wei Dai’s solution also works.
During the April 2013 workshop I rephrased this as the principle “The actions and sensor values of the offspring should not appear outside of quantifiers”. Justification: If we have to reason case-by-case about all possible actions, all possible sensor values, and all possibles states of the world, our child’s size must be less than or equal to “the number of cases we can consider” / “child’s sensor state space” x “child’s action state space” x “world state space” which in general implies a logarithmically smaller child. I call this the Vingean Principle.
Let me back up from my other response. It just occurred to me that UDT1.1 (with a proof system instead of “math intuition”) already constitutes a quining solution to AI reflection.
Consider an AI facing the choice of either creating a copy of itself, which will then go out into the world, or doing nothing. Unfortunately, due to Lobian problems it can’t prove that a copy of itself won’t do something worse than nothing. But UDT1.1 can be thought of as optimizing over an input/output mapping that is implemented by all of its copies. For each possible mapping, it proves a utility value starting from the assumption that it implements that map (which implies that its copies and provably equivalent variants also implement that map). So it doesn’t need to prove (from scratch) that its copy won’t do something worse than nothing.
(Requested reply.)
I think there’d be a wide variety of systems where, so long as the “parent” agent knows the exact strategy that its child will deploy in all relevant situations at “compile time”, the parent will trust the child. The point of the Lob problem is that it arises when we want the parent to trust the child generally, without knowing exactly what the child will do. For the parent to precompute the child’s exact actions implies that the child can’t be smarter than the parent, so it’s not the kind of situation we would encounter when e.g. Agent A wants to build Agent B which has more RAM and faster CPUs than Agent A while still sharing Agent A’s goals. This, of course, is the kind of “agents building agents” scenario that I am most interested in.
That’s not the case.
Consider a resource-constrained variant of the original game:
Each program receives as input the round number n and the next program, encrypted by repeated xoring with the output of a monotonic computable function f(n).
Let T_f(n) be the runtime of the fastest algorithm that computes f. Note that T_f(n) is monotonically increasing.
At round n, the current program has a time limit T(n) = C_0 + C_1 * T_f(n). Quirrell never submits programs that exceed the time limit.
In this variant of the game, you have to submit the first program, which has to obey a time limit T(0).
The initial program will not be able to compute the relevant strategy of any of its successors (except at most finitely many of them).
And yet, my quining solution still works (just add a decryption step), and I think Wei Dai’s solution also works.
It seems to me that the variant with time limits has a simple quining solution:
1) Get the time limit as input.
2) Spend almost all available time trying to prove in some formal theory that the next program is equivalent to this one.
3) Double down if a proof is found, otherwise take winnings.
That’s similar to your idea, right? I’m not sure if it addresses Eliezer’s objection, because I no longer understand his objection...
Yes.
During the April 2013 workshop I rephrased this as the principle “The actions and sensor values of the offspring should not appear outside of quantifiers”. Justification: If we have to reason case-by-case about all possible actions, all possible sensor values, and all possibles states of the world, our child’s size must be less than or equal to “the number of cases we can consider” / “child’s sensor state space” x “child’s action state space” x “world state space” which in general implies a logarithmically smaller child. I call this the Vingean Principle.