The universe defined in the post doesn’t have any contradictions. Every oracle studied in theoretical CS is “incomplete” in the sense that it can’t answer all questions about programs that invoke the oracle itself, e.g. the halting oracle cannot answer all questions about programs that can invoke the halting oracle. But it can answer some questions about such programs, which turns out to be enough for the agent to make a decision.
I am not convinced that the oracle can answer any questions about programs that call the oracle. That is simply not one of the things it does.
The halting problem corresponds to a language L which consists of all pairs (P, w) such that program P halts on input w. Then the oracle is defined to return 1 if you give it a pair in L, and 0 otherwise. But to do this, we have to have L defined already, so that definition cannot refer to this oracle. A recursive definition “L is the set of programs with an oracle for L that halt” is just ill-defined.
We’re not asking the oracle directly about the halting behavior of programs that call the oracle. Instead we’re using it as a provability oracle, asking it only questions of the form “if a program exhaustively searches for proofs of such-and-such statement in the formal system S, will it ever terminate?”
Since the statements we’re trying to prove, in turn, refer to the behavior of programs that call the oracle, we need S to have some assumptions about the behavior of the oracle. These assumptions can be pretty weak, e.g. the proof in the post seems to go through even if S knows only that the oracle is a pure function that returns identical outputs for identical inputs. Or you could make the assumptions stronger, e.g. formalize the notion of “halting oracle” within S. That wouldn’t break anything either, I think.
However, now I have an objection to a different step! (which I’m sure you will also address)
But S also proves that A()=1 ⇒ U()=1000000, therefore S proves that A()≠1. According to the first step of the algorithm, A will play chicken with the universe and return 1, making S inconsistent.
Just to clarify, let’s say that the first step of A first checks if S proves that A()≠2, and then checks if S proves that A()≠1 (we could do this in either order, it doesn’t matter, but this order makes the following explanation make sense better).
S proves that A()≠1. However, S will only be inconsistent if S can also prove that A()=1. To prove that A()=1, S needs to prove that S proves that A()≠1 (does this still make sense?). Furthermore, S needs to prove that S does not prove that A()≠2 (otherwise, A will perversely return 2 before it considers perversely returning 1).
In particular, the statement “S does not prove that A()≠2” implies “S is consistent” and therefore S cannot prove it.
I also have the suspicion that “S proves that A()≠1” is not something S can prove, but my logic-fu is not strong enough to decide one way or the other.
Let’s step through the algorithm. If S proves that A()≠2, then A will immediately return 2. That means S has proved a false statement about the universe. If S doesn’t prove that A()≠2, but proves that A()≠1, then A will immediately return 1, which also means S has proved a false statement about the universe.
So yeah, I guess calling it a violation of consistency was wrong. It’s actually a violation of soundness: all axioms of S must be true statements about the universe. If S is sound, then the above reasoning shows that S cannot prove A()≠a for any a. (Also soundness implies consistency, which might explain why I made the mistake.) Thanks for pointing that out! Edited the post accordingly.
Alright, that makes sense (if my reading of proofs seems uncharitable at times, it is because I know enough about logic to know when a statement doesn’t make sense, but I don’t know enough to tell what the statement wants to be).
Soundness seems like an interesting property to think about for a formal system. I am reminded of the bizarre systems you can get, e.g., by taking PA and adding the axiom “PA is inconsistent”. This, if I recall correctly, is consistent provided PA itself is consistent, but (whether or not it’s consistent) it definitely can’t be sound.
Do we know in advance which questions are going to be answered? Do we know agent A from your OP isn’t going to end up crying in step 2?
I apologize if I’m just asking about standard CS assumptions here. I have a CS bachelor’s degree but I don’t remember ever discussing programs that can call halting oracles. Either my memory is faulty (more likely) or I chose the wrong electives.
It’s easy to write a universe program U that would make A cry. The post only proves that A won’t end up crying for one specific U, and outlines an argument why it won’t cry in some other similar problems.
The class of universes where A doesn’t end up crying is supposed to roughly correspond to the informal class of “fair” decision problems where the agent’s action is the only thing that determines the resulting utility, but I have no strong argument why it’s a good formalization of that class.
One example of an “unfair” decision problem would be a universe that rewarded you for having a specific algorithm, rather than for returning a specific value. Such “unfair” problems can be made up to punish any decision theory you can come up with, so they’re probably not a good test case.
Is anyone at all working on classes of “unfair” problems, such as ones that give different utilities based upon the amount of time spent computing? Or ones that take into consideration any type of resource used to make that decision (energy or memory). This class seems important to me and less arbitrary than “unfair” problems that punish specific algorithms.
Wei Dai has a tentative decision theory that covers some of those cases. I didn’t find it very convincing, but it’s likely that I overlooked something. Any work on such problems would be very welcome, of course.
I’ll have a think. An optimal decision maker for all scenarios seems impossible if your utility is reduced by an amount proportional to the time take to make the decision (“solving death” has this structure, less people die if you solve it earlier). The best in general I can think of is an infinite table mapping scenarios to a the decision computed by something like your UDT + oracle for that scenario. And this can be beaten in each individual scenario by a specialised algorithm for that scenario, that needs no look up.
And it still has an infinite quantity which I don’t like in my theories that I might want to connect to the real world one day (and requires an infinite amount of precomputation).
I wonder if there is a quality apart from strict optimality that we need to look for. Making the optimal decision in most problems( what is the correct weighting of scenarios)? Making the right decision eventually?
Anyway I’ll think some more. It is definitely thornier and nastier than “fair” problems.
The universe defined in the post doesn’t have any contradictions. Every oracle studied in theoretical CS is “incomplete” in the sense that it can’t answer all questions about programs that invoke the oracle itself, e.g. the halting oracle cannot answer all questions about programs that can invoke the halting oracle. But it can answer some questions about such programs, which turns out to be enough for the agent to make a decision.
I am not convinced that the oracle can answer any questions about programs that call the oracle. That is simply not one of the things it does.
The halting problem corresponds to a language L which consists of all pairs (P, w) such that program P halts on input w. Then the oracle is defined to return 1 if you give it a pair in L, and 0 otherwise. But to do this, we have to have L defined already, so that definition cannot refer to this oracle. A recursive definition “L is the set of programs with an oracle for L that halt” is just ill-defined.
We’re not asking the oracle directly about the halting behavior of programs that call the oracle. Instead we’re using it as a provability oracle, asking it only questions of the form “if a program exhaustively searches for proofs of such-and-such statement in the formal system S, will it ever terminate?”
Since the statements we’re trying to prove, in turn, refer to the behavior of programs that call the oracle, we need S to have some assumptions about the behavior of the oracle. These assumptions can be pretty weak, e.g. the proof in the post seems to go through even if S knows only that the oracle is a pure function that returns identical outputs for identical inputs. Or you could make the assumptions stronger, e.g. formalize the notion of “halting oracle” within S. That wouldn’t break anything either, I think.
I see, thanks, that clears it up.
However, now I have an objection to a different step! (which I’m sure you will also address)
Just to clarify, let’s say that the first step of A first checks if S proves that A()≠2, and then checks if S proves that A()≠1 (we could do this in either order, it doesn’t matter, but this order makes the following explanation make sense better).
S proves that A()≠1. However, S will only be inconsistent if S can also prove that A()=1. To prove that A()=1, S needs to prove that S proves that A()≠1 (does this still make sense?). Furthermore, S needs to prove that S does not prove that A()≠2 (otherwise, A will perversely return 2 before it considers perversely returning 1).
In particular, the statement “S does not prove that A()≠2” implies “S is consistent” and therefore S cannot prove it.
I also have the suspicion that “S proves that A()≠1” is not something S can prove, but my logic-fu is not strong enough to decide one way or the other.
Let’s step through the algorithm. If S proves that A()≠2, then A will immediately return 2. That means S has proved a false statement about the universe. If S doesn’t prove that A()≠2, but proves that A()≠1, then A will immediately return 1, which also means S has proved a false statement about the universe.
So yeah, I guess calling it a violation of consistency was wrong. It’s actually a violation of soundness: all axioms of S must be true statements about the universe. If S is sound, then the above reasoning shows that S cannot prove A()≠a for any a. (Also soundness implies consistency, which might explain why I made the mistake.) Thanks for pointing that out! Edited the post accordingly.
Alright, that makes sense (if my reading of proofs seems uncharitable at times, it is because I know enough about logic to know when a statement doesn’t make sense, but I don’t know enough to tell what the statement wants to be).
Soundness seems like an interesting property to think about for a formal system. I am reminded of the bizarre systems you can get, e.g., by taking PA and adding the axiom “PA is inconsistent”. This, if I recall correctly, is consistent provided PA itself is consistent, but (whether or not it’s consistent) it definitely can’t be sound.
I want people to read my math uncharitably and poke holes in it, otherwise I wouldn’t post :-)
If S was sound-if-consistent, this problem makes it unsound, hence inconsistent.
Do we know in advance which questions are going to be answered? Do we know agent A from your OP isn’t going to end up crying in step 2?
I apologize if I’m just asking about standard CS assumptions here. I have a CS bachelor’s degree but I don’t remember ever discussing programs that can call halting oracles. Either my memory is faulty (more likely) or I chose the wrong electives.
It’s easy to write a universe program U that would make A cry. The post only proves that A won’t end up crying for one specific U, and outlines an argument why it won’t cry in some other similar problems.
The class of universes where A doesn’t end up crying is supposed to roughly correspond to the informal class of “fair” decision problems where the agent’s action is the only thing that determines the resulting utility, but I have no strong argument why it’s a good formalization of that class.
One example of an “unfair” decision problem would be a universe that rewarded you for having a specific algorithm, rather than for returning a specific value. Such “unfair” problems can be made up to punish any decision theory you can come up with, so they’re probably not a good test case.
Thanks for explaining.
Is anyone at all working on classes of “unfair” problems, such as ones that give different utilities based upon the amount of time spent computing? Or ones that take into consideration any type of resource used to make that decision (energy or memory). This class seems important to me and less arbitrary than “unfair” problems that punish specific algorithms.
Wei Dai has a tentative decision theory that covers some of those cases. I didn’t find it very convincing, but it’s likely that I overlooked something. Any work on such problems would be very welcome, of course.
I’ll have a think. An optimal decision maker for all scenarios seems impossible if your utility is reduced by an amount proportional to the time take to make the decision (“solving death” has this structure, less people die if you solve it earlier). The best in general I can think of is an infinite table mapping scenarios to a the decision computed by something like your UDT + oracle for that scenario. And this can be beaten in each individual scenario by a specialised algorithm for that scenario, that needs no look up.
And it still has an infinite quantity which I don’t like in my theories that I might want to connect to the real world one day (and requires an infinite amount of precomputation).
I wonder if there is a quality apart from strict optimality that we need to look for. Making the optimal decision in most problems( what is the correct weighting of scenarios)? Making the right decision eventually?
Anyway I’ll think some more. It is definitely thornier and nastier than “fair” problems.
I recently made some progress on your question. Section 4 seems to be the most relevant.