(This post resulted from a conversation with Wei Dai.)
Formalizing the tiling agents problem is very delicate. In this post I’ll show a toy problem and a solution to it, which arguably meets all the desiderata, but only by cheating in a new and unusual way.
Here’s a summary of the toy problem: we ask an agent to solve a difficult math question and also design a successor agent. Then the successor solves another math question and designs its own successor, and so on. The questions get harder each time, so they can’t all be solved in advance, and each of them requires believing in Peano arithmetic (PA). This goes on for a fixed number of rounds, and the final reward is the number of correct answers.
Moreover, we will demand that the agent must handle both subtasks (solving the math question and designing the successor) using the same logic. Finally, we will demand that the agent be able to reproduce itself at each round, not just design a custom-made successor that solves the math question with PA and reproduces itself by quining.
To nail down the formalization, we will use good old computable programs with proof limits, not oracles or modal formulas. The universe program U will contain the agent program A as a subprogram, and A will know the full source code of U (including A) by diagonalization. At each round, the agent will receive the text of a math question as input, and return a pair consisting of the answer to the math question and the source code of the successor agent.
The math questions will be of the form “you’re allowed to use N_i computation steps, here’s the source code of a program P_i, what does it return?” The programs P_i will be chosen so that N_i steps isn’t enough to run them, but is more than enough to find a PA proof that P_i returns some particular value.
To give the agent some assurance of the above, the universe program will use the following transparent procedure to evaluate the agent’s answer:
Run P_i to determine the right answer, and compare it with the agent’s answer.
Verify that there exist PA proofs about P_i that can be found in much fewer than N_i computation steps. Ignore the answers found in the proofs.
Increment the agent’s utility if (1) succeeded or (2) failed.
We will also assume that all math questions are so difficult, and all N_i are so large, that doing any fixed work up front is a rounding error by comparison.
The above sounds like a fair formulation of the tiling agents problem, right? The Vingean principle is satisfied because the initial agent doesn’t have enough time to solve all math questions up front. The tiling principle is satisfied because we want a solution that will reproduce itself at each round. The naturalistic principle is satisfied because we want to uniformly handle both subtasks (solve the math question and design a successor).
Now I will show a solution that seems to meet all desiderata. It will be based on the formal theory PA+Con(PA) (which is the weakest possible, because the problem requires creating successors that trust PA). But you will still say that I’m cheating.
Let the agent A be this program:
“Ignore my input at first. Using a small number of computation steps, and using PA+Con(PA) as the formal theory, search for programs B such that, if all instances of A internally delegate all input-output to B, then U will provably return some particular utility. Then internally delegate to the B that leads to the highest utility found.”
Then A will be able to prove that delegating to the following program B will lead to utility equal to number of rounds:
“Accept P_i and N_i as input. Using slightly fewer than N_i computation steps, and using PA as the formal theory, find the first proof that P_i returns some particular value. Then return that value and designate A as successor.”
The proof is obvious. If A internally delegates to B on the first round, then it will delegate to the same B on all subsequent rounds, because it ignores the input while choosing B. That fact is easily provable in PA. Then B goes on to handle the inputs on different rounds and wins all of them, which is easily provable in PA+Con(PA), due to the way the universe program verifies the answers. At each round B designates A as successor, which rubber-stamps B again, and the cycle continues.
You could say that A fails the Vingean principle, but that wouldn’t be fair. After all, A indeed can’t solve all questions in advance, because it doesn’t have enough time. It only solves the question “which B to run internally?” in advance, giving the same answer redundantly each time, but that’s just an implementation detail. If you didn’t know how A worked internally, you wouldn’t object to it.
Of course it’s not guaranteed that A will reproduce itself. It knows that reproducing itself will lead to utility equal to number of rounds, and we happen to know that higher utility isn’t achievable, but the same utility could be achieved in other ways (e.g. by outputting a B that delegates to itself directly, instead of going through A). But that’s not a strong objection, because given the right tie-breaking, A will indeed reproduce itself.
Moreover, the approach of A looks quite general, without anything specific to this problem. It can also deal with all the usual UDT test cases, like Newcomb’s Problem, as well as all imaginable tiling problems that are amenable to quining. So someone optimistic could say that A is a version of UDT that solves the tiling agents problem!
That would be unsatisfactory though. It looks more like A has found a trick that leads to quining in a roundabout way. I feel that the people working on tiling agents wouldn’t accept my solution, but I can’t quite figure out which new principle would rule it out. Perhaps some kind of “no meta-quining principle”? Do humans obey that principle, or do they have intuitions like “solve decision theory in the abstract and delegate to that” which sound suspiciously like my agent? It’s tricky and my thinking is very confused right now.
A cheating approach to the tiling agents problem
(This post resulted from a conversation with Wei Dai.)
Formalizing the tiling agents problem is very delicate. In this post I’ll show a toy problem and a solution to it, which arguably meets all the desiderata, but only by cheating in a new and unusual way.
Here’s a summary of the toy problem: we ask an agent to solve a difficult math question and also design a successor agent. Then the successor solves another math question and designs its own successor, and so on. The questions get harder each time, so they can’t all be solved in advance, and each of them requires believing in Peano arithmetic (PA). This goes on for a fixed number of rounds, and the final reward is the number of correct answers.
Moreover, we will demand that the agent must handle both subtasks (solving the math question and designing the successor) using the same logic. Finally, we will demand that the agent be able to reproduce itself at each round, not just design a custom-made successor that solves the math question with PA and reproduces itself by quining.
To nail down the formalization, we will use good old computable programs with proof limits, not oracles or modal formulas. The universe program U will contain the agent program A as a subprogram, and A will know the full source code of U (including A) by diagonalization. At each round, the agent will receive the text of a math question as input, and return a pair consisting of the answer to the math question and the source code of the successor agent.
The math questions will be of the form “you’re allowed to use N_i computation steps, here’s the source code of a program P_i, what does it return?” The programs P_i will be chosen so that N_i steps isn’t enough to run them, but is more than enough to find a PA proof that P_i returns some particular value.
To give the agent some assurance of the above, the universe program will use the following transparent procedure to evaluate the agent’s answer:
Run P_i to determine the right answer, and compare it with the agent’s answer.
Verify that there exist PA proofs about P_i that can be found in much fewer than N_i computation steps. Ignore the answers found in the proofs.
Increment the agent’s utility if (1) succeeded or (2) failed.
We will also assume that all math questions are so difficult, and all N_i are so large, that doing any fixed work up front is a rounding error by comparison.
The above sounds like a fair formulation of the tiling agents problem, right? The Vingean principle is satisfied because the initial agent doesn’t have enough time to solve all math questions up front. The tiling principle is satisfied because we want a solution that will reproduce itself at each round. The naturalistic principle is satisfied because we want to uniformly handle both subtasks (solve the math question and design a successor).
Now I will show a solution that seems to meet all desiderata. It will be based on the formal theory PA+Con(PA) (which is the weakest possible, because the problem requires creating successors that trust PA). But you will still say that I’m cheating.
Let the agent A be this program:
“Ignore my input at first. Using a small number of computation steps, and using PA+Con(PA) as the formal theory, search for programs B such that, if all instances of A internally delegate all input-output to B, then U will provably return some particular utility. Then internally delegate to the B that leads to the highest utility found.”
Then A will be able to prove that delegating to the following program B will lead to utility equal to number of rounds:
“Accept P_i and N_i as input. Using slightly fewer than N_i computation steps, and using PA as the formal theory, find the first proof that P_i returns some particular value. Then return that value and designate A as successor.”
The proof is obvious. If A internally delegates to B on the first round, then it will delegate to the same B on all subsequent rounds, because it ignores the input while choosing B. That fact is easily provable in PA. Then B goes on to handle the inputs on different rounds and wins all of them, which is easily provable in PA+Con(PA), due to the way the universe program verifies the answers. At each round B designates A as successor, which rubber-stamps B again, and the cycle continues.
You could say that A fails the Vingean principle, but that wouldn’t be fair. After all, A indeed can’t solve all questions in advance, because it doesn’t have enough time. It only solves the question “which B to run internally?” in advance, giving the same answer redundantly each time, but that’s just an implementation detail. If you didn’t know how A worked internally, you wouldn’t object to it.
Of course it’s not guaranteed that A will reproduce itself. It knows that reproducing itself will lead to utility equal to number of rounds, and we happen to know that higher utility isn’t achievable, but the same utility could be achieved in other ways (e.g. by outputting a B that delegates to itself directly, instead of going through A). But that’s not a strong objection, because given the right tie-breaking, A will indeed reproduce itself.
Moreover, the approach of A looks quite general, without anything specific to this problem. It can also deal with all the usual UDT test cases, like Newcomb’s Problem, as well as all imaginable tiling problems that are amenable to quining. So someone optimistic could say that A is a version of UDT that solves the tiling agents problem!
That would be unsatisfactory though. It looks more like A has found a trick that leads to quining in a roundabout way. I feel that the people working on tiling agents wouldn’t accept my solution, but I can’t quite figure out which new principle would rule it out. Perhaps some kind of “no meta-quining principle”? Do humans obey that principle, or do they have intuitions like “solve decision theory in the abstract and delegate to that” which sound suspiciously like my agent? It’s tricky and my thinking is very confused right now.
What do you think?