The main relevant aspect of the real problem not captured by quining is the continuity of self-replacement with environmental action. All the quining approaches Marcello and I considered, involved differently treating the decisions to “modify self” and “do something in environment”. This means you can’t use actions to construct an environmental copy of yourself out of transistors, because the environmental copy wouldn’t get the privileged-special-case treatment. More generally, it seems to imply an arbitrary Cartesian boundary with different levels of mathematical trust for physical processes inside and outside the boundary. And that you can’t just frame desirable actions in terms of their predicted consequences.
Edit: This doesn’t mean quining approaches aren’t worth exploring. They might get us closer to a solution than contemplating, say, approaches that don’t quine.
Let me back up from my other response. It just occurred to me that UDT1.1 (with a proof system instead of “math intuition”) already constitutes a quining solution to AI reflection.
Consider an AI facing the choice of either creating a copy of itself, which will then go out into the world, or doing nothing. Unfortunately, due to Lobian problems it can’t prove that a copy of itself won’t do something worse than nothing. But UDT1.1 can be thought of as optimizing over an input/output mapping that is implemented by all of its copies. For each possible mapping, it proves a utility value starting from the assumption that it implements that map (which implies that its copies and provably equivalent variants also implement that map). So it doesn’t need to prove (from scratch) that its copy won’t do something worse than nothing.
I think there’d be a wide variety of systems where, so long as the “parent” agent knows the exact strategy that its child will deploy in all relevant situations at “compile time”, the parent will trust the child. The point of the Lob problem is that it arises when we want the parent to trust the child generally, without knowing exactly what the child will do. For the parent to precompute the child’s exact actions implies that the child can’t be smarter than the parent, so it’s not the kind of situation we would encounter when e.g. Agent A wants to build Agent B which has more RAM and faster CPUs than Agent A while still sharing Agent A’s goals. This, of course, is the kind of “agents building agents” scenario that I am most interested in.
Consider a resource-constrained variant of the original game:
Each program receives as input the round number n and the next program, encrypted by repeated xoring with the output of a monotonic computable function f(n). Let T_f(n) be the runtime of the fastest algorithm that computes f. Note that T_f(n) is monotonically increasing.
At round n, the current program has a time limit T(n) = C_0 + C_1 * T_f(n). Quirrell never submits programs that exceed the time limit. In this variant of the game, you have to submit the first program, which has to obey a time limit T(0).
The initial program will not be able to compute the relevant strategy of any of its successors (except at most finitely many of them). And yet, my quining solution still works (just add a decryption step), and I think Wei Dai’s solution also works.
During the April 2013 workshop I rephrased this as the principle “The actions and sensor values of the offspring should not appear outside of quantifiers”. Justification: If we have to reason case-by-case about all possible actions, all possible sensor values, and all possibles states of the world, our child’s size must be less than or equal to “the number of cases we can consider” / “child’s sensor state space” x “child’s action state space” x “world state space” which in general implies a logarithmically smaller child. I call this the Vingean Principle.
The main relevant aspect of the real problem not captured by quining is the continuity of self-replacement with environmental action.
Right, so we want an AI design that does the right thing “naturally”, without a special case for “modify self”. It seems to me that UDT may already be such a design, if we had a solution to logical uncertainty (what I called “math intuition module”) to plug into it. When I wrote about logical uncertainty being important for AGI, I had in mind the problem of not being able to prove P!=NP and hence not being able to prove whether EU(search for polynomial solution to 3-SAT) < EU(not search for polynomial solution to 3-SAT). But if we had an AI that can handle that kind of decision problems, why wouldn’t it also be able to design and build another AI that uses a proof system that it can’t prove to be sound? Is there a reason to think that “AI reflection” isn’t just a special case of “logical uncertainty”?
(Of course UDT would probably build successors/copies of itself that also use “math intuition modules” rather than “proof systems” but we can imagine forcing a UDT-AI to choose between a selection of other AIs that are based on different proof systems, in which it would choose based on an analysis of the trade-off between power vs risk of inconsistency/unsoundness, like how a human would make the choice.)
I can’t visualize how saying that a result is “uncertain” could make the Lobian issue go away—do you have a concrete visualization for this, and if so, can you sketch it even if very briefly?
Possibly relevant: No Turing machine can assign finite probability to the halting sequence.
(Regardless of whether UDT1.1 is a correct quining solution to AI Reflection, we ultimately still need something that is not so “brute force”, so let’s continue this thread.)
I can’t visualize how saying that a result is “uncertain” could make the Lobian issue go away—do you have a concrete visualization for this, and if so, can you sketch it even if very briefly?
I guess I’m saying that there’s not necessarily a Lob’s theorem or equivalent for a system that (does something like) assigns probabilities to mathematical statements instead of proving/disproving them from a fixed set of axioms. Do you think there is such an equivalent to Lob’s theorem, and if so what might it say?
Possibly relevant: No Turing machine can assign finite probability to the halting sequence.
I’m not seeing why this is relevant. Can you explain?
Do you think there is such an equivalent to Lob’s theorem, and if so what might it say?
In the Q&A to the open problems talk, he said that if logical uncertainty is the solution to the problem, then he and Marcello didn’t figure out how, and they did look in that specific place.
I think that it would be useful to try to make some progress about how to reason probabilistically about logical statements before trying to use it as the solution to other problems, because my picture of how it could work is so unclear that I feel unable to draw any conclusions about what it can or cannot do with any kind of certainty. For example, I do not know a condition on a probability distribution over mathematical statements that would prevent the spurious proof problem—when I read about playing chicken with the universe, it sounded very familiar, but I can’t now see a way to apply it in the case of a probability distribution. (Then again, if we do expect to find a solution to the Löbian problems in that area, perhaps we should be working on logical uncertainty now, not on Löbian problems...)
(Then again, if we do expect to find a solution to the Löbian problems in that area, perhaps we should be working on logical uncertainty now, not on Löbian problems...)
If it weren’t for concerns about AI risk, I would be advocating working on logical uncertainty instead of Löbian problems, because it seems to me that an AI using “math intuition” may not face a Löbian problem to begin with, or if it does, the problem and/or solution may be different enough from that of an AI using a proof system that any work we do now will not be of much help.
(From the perspective of reducing AI risk, maybe it’s good to focus people’s attention on Löbian problems, unless it leads them to start thinking that they ought to work on logical uncertainty due to reasoning like the above...)
Hm; are you saying you think FAI can probably be implemented without solving the logical uncertainty problem? My current visualization is that both the logical uncertainty problem and the safe rewrite problem will need to be solved—among others—and the reason I’ve been thinking about the rewrite problem is that using proof-based techniques, I at least had a grasp of how to think about it. (And my intuition has so far been that logical uncertainty will probably have diagonalization problems in a different guise when we actually have a concrete enough proposal to test this, so it seemed useful to think about the rewrite problem in the better-understood context, when trying to find in-principle solutions to the problems.)
Hm; are you saying you think FAI can probably be implemented without solving the logical uncertainty problem?
No, I was saying the opposite, and also hinting that working on and publishing results about logical uncertainty may be bad for AI risk because it helps AGI, not just FAI (whereas the AI reflection problem seems to be more specific to FAI). There’s also a discussion about this issue in the decision theory mailing list archives.
He’s talking about a probability distribution over the set of all bitstrings. You can’t assign p=0.5 (or any other constant real number) to each of them: that doesn’t sum to 1.0.
The main relevant aspect of the real problem not captured by quining is the continuity of self-replacement with environmental action. All the quining approaches Marcello and I considered, involved differently treating the decisions to “modify self” and “do something in environment”. This means you can’t use actions to construct an environmental copy of yourself out of transistors, because the environmental copy wouldn’t get the privileged-special-case treatment. More generally, it seems to imply an arbitrary Cartesian boundary with different levels of mathematical trust for physical processes inside and outside the boundary. And that you can’t just frame desirable actions in terms of their predicted consequences.
Edit: This doesn’t mean quining approaches aren’t worth exploring. They might get us closer to a solution than contemplating, say, approaches that don’t quine.
Let me back up from my other response. It just occurred to me that UDT1.1 (with a proof system instead of “math intuition”) already constitutes a quining solution to AI reflection.
Consider an AI facing the choice of either creating a copy of itself, which will then go out into the world, or doing nothing. Unfortunately, due to Lobian problems it can’t prove that a copy of itself won’t do something worse than nothing. But UDT1.1 can be thought of as optimizing over an input/output mapping that is implemented by all of its copies. For each possible mapping, it proves a utility value starting from the assumption that it implements that map (which implies that its copies and provably equivalent variants also implement that map). So it doesn’t need to prove (from scratch) that its copy won’t do something worse than nothing.
(Requested reply.)
I think there’d be a wide variety of systems where, so long as the “parent” agent knows the exact strategy that its child will deploy in all relevant situations at “compile time”, the parent will trust the child. The point of the Lob problem is that it arises when we want the parent to trust the child generally, without knowing exactly what the child will do. For the parent to precompute the child’s exact actions implies that the child can’t be smarter than the parent, so it’s not the kind of situation we would encounter when e.g. Agent A wants to build Agent B which has more RAM and faster CPUs than Agent A while still sharing Agent A’s goals. This, of course, is the kind of “agents building agents” scenario that I am most interested in.
That’s not the case.
Consider a resource-constrained variant of the original game:
Each program receives as input the round number n and the next program, encrypted by repeated xoring with the output of a monotonic computable function f(n).
Let T_f(n) be the runtime of the fastest algorithm that computes f. Note that T_f(n) is monotonically increasing.
At round n, the current program has a time limit T(n) = C_0 + C_1 * T_f(n). Quirrell never submits programs that exceed the time limit.
In this variant of the game, you have to submit the first program, which has to obey a time limit T(0).
The initial program will not be able to compute the relevant strategy of any of its successors (except at most finitely many of them).
And yet, my quining solution still works (just add a decryption step), and I think Wei Dai’s solution also works.
It seems to me that the variant with time limits has a simple quining solution:
1) Get the time limit as input.
2) Spend almost all available time trying to prove in some formal theory that the next program is equivalent to this one.
3) Double down if a proof is found, otherwise take winnings.
That’s similar to your idea, right? I’m not sure if it addresses Eliezer’s objection, because I no longer understand his objection...
Yes.
During the April 2013 workshop I rephrased this as the principle “The actions and sensor values of the offspring should not appear outside of quantifiers”. Justification: If we have to reason case-by-case about all possible actions, all possible sensor values, and all possibles states of the world, our child’s size must be less than or equal to “the number of cases we can consider” / “child’s sensor state space” x “child’s action state space” x “world state space” which in general implies a logarithmically smaller child. I call this the Vingean Principle.
Right, so we want an AI design that does the right thing “naturally”, without a special case for “modify self”. It seems to me that UDT may already be such a design, if we had a solution to logical uncertainty (what I called “math intuition module”) to plug into it. When I wrote about logical uncertainty being important for AGI, I had in mind the problem of not being able to prove P!=NP and hence not being able to prove whether EU(search for polynomial solution to 3-SAT) < EU(not search for polynomial solution to 3-SAT). But if we had an AI that can handle that kind of decision problems, why wouldn’t it also be able to design and build another AI that uses a proof system that it can’t prove to be sound? Is there a reason to think that “AI reflection” isn’t just a special case of “logical uncertainty”?
(Of course UDT would probably build successors/copies of itself that also use “math intuition modules” rather than “proof systems” but we can imagine forcing a UDT-AI to choose between a selection of other AIs that are based on different proof systems, in which it would choose based on an analysis of the trade-off between power vs risk of inconsistency/unsoundness, like how a human would make the choice.)
I can’t visualize how saying that a result is “uncertain” could make the Lobian issue go away—do you have a concrete visualization for this, and if so, can you sketch it even if very briefly?
Possibly relevant: No Turing machine can assign finite probability to the halting sequence.
(Regardless of whether UDT1.1 is a correct quining solution to AI Reflection, we ultimately still need something that is not so “brute force”, so let’s continue this thread.)
I guess I’m saying that there’s not necessarily a Lob’s theorem or equivalent for a system that (does something like) assigns probabilities to mathematical statements instead of proving/disproving them from a fixed set of axioms. Do you think there is such an equivalent to Lob’s theorem, and if so what might it say?
I’m not seeing why this is relevant. Can you explain?
In the Q&A to the open problems talk, he said that if logical uncertainty is the solution to the problem, then he and Marcello didn’t figure out how, and they did look in that specific place.
I think that it would be useful to try to make some progress about how to reason probabilistically about logical statements before trying to use it as the solution to other problems, because my picture of how it could work is so unclear that I feel unable to draw any conclusions about what it can or cannot do with any kind of certainty. For example, I do not know a condition on a probability distribution over mathematical statements that would prevent the spurious proof problem—when I read about playing chicken with the universe, it sounded very familiar, but I can’t now see a way to apply it in the case of a probability distribution. (Then again, if we do expect to find a solution to the Löbian problems in that area, perhaps we should be working on logical uncertainty now, not on Löbian problems...)
If it weren’t for concerns about AI risk, I would be advocating working on logical uncertainty instead of Löbian problems, because it seems to me that an AI using “math intuition” may not face a Löbian problem to begin with, or if it does, the problem and/or solution may be different enough from that of an AI using a proof system that any work we do now will not be of much help.
(From the perspective of reducing AI risk, maybe it’s good to focus people’s attention on Löbian problems, unless it leads them to start thinking that they ought to work on logical uncertainty due to reasoning like the above...)
Hm; are you saying you think FAI can probably be implemented without solving the logical uncertainty problem? My current visualization is that both the logical uncertainty problem and the safe rewrite problem will need to be solved—among others—and the reason I’ve been thinking about the rewrite problem is that using proof-based techniques, I at least had a grasp of how to think about it. (And my intuition has so far been that logical uncertainty will probably have diagonalization problems in a different guise when we actually have a concrete enough proposal to test this, so it seemed useful to think about the rewrite problem in the better-understood context, when trying to find in-principle solutions to the problems.)
No, I was saying the opposite, and also hinting that working on and publishing results about logical uncertainty may be bad for AI risk because it helps AGI, not just FAI (whereas the AI reflection problem seems to be more specific to FAI). There’s also a discussion about this issue in the decision theory mailing list archives.
Can someone explain how an “always return 0.5” bot is not a counterexample?
He’s talking about a probability distribution over the set of all bitstrings. You can’t assign p=0.5 (or any other constant real number) to each of them: that doesn’t sum to 1.0.