I’ve now edited the post to give the version which I claim works in the empirically uncertain case, and give more hints for how it still goes through in the fully logically uncertain case.
Sorry to necro this here, but I find this topic extremely interesting and I keep coming back to this page to stare at it and tie my brain in knots. Thanks for your notes on how it works in the logically uncertain case. I found a different objection based on the assumption of logical omniscience:
Regarding this you say:
Perhaps you think that the problem with the above version is that I assumed logical omniscience. It is unrealistic to suppose that agents have beliefs which perfectly respect logic. (Un)Fortunately, the argument doesn’t really depend on this; it only requires that the agent respects proofs which it can see, and eventually sees the Löbian proof referenced.
However, this assumes that the Löbian proof exists. We show that the Löbian proof of A=cross→U=−10 exists by showing that the agent can prove □(A=cross→U=−10)→(A=cross→U=−10), and the agent’s proof seems to assume logical omniscience:
Examining the agent, either crossing had higher expected utility, or P(cross)=0. But we assumed □(A=cross→U=−10), so it must be the latter. So the bridge gets blown up.
If □ here means “provable in PA”, the logic does not follow through if the agent is not logically omniscient: the agent might find crossing to have a higher expected utility regardless, because it may not have seen the proof. If □ here instead means “discoverable by the agent’s proof search” or something to that effect, then the logic here seems to follow through (making the reasonable assumption that if the agent can discover a proof for A=cross->U=-10, then it will set its expected value for crossing to −10). However, that would mean we are talking about provability in a system which can only prove finitely many things, which in particular cannot contain PA and so Löb’s theorem does not apply.
I am still trying to wrap my head around exactly what this means, since your logic seems unassailable in the logically omniscient case. It is counterintuitive to me that the logically omniscient agent would be susceptible to trolling but the more limited one would not. Perhaps there is a clever way for the troll to get around this issue? I dunno. I certainly have no proof that such an agent cannot be trolled in such a way.
Sorry to necro this here, but I find this topic extremely interesting and I keep coming back to this page to stare at it and tie my brain in knots.
(As for this, I think a major goal of LessWrong—and the alignment forum—is to facilitate sustained intellectual progress; a subgoal of that is that discussions can be sustained over long periods of time, rather than flitting about as would be the case if we only had attention for discussing the most recent posts!!)
If □ here instead means “discoverable by the agent’s proof search” or something to that effect, then the logic here seems to follow through (making the reasonable assumption that if the agent can discover a proof for A=cross->U=-10, then it will set its expected value for crossing to −10).
Right, this is what you have to do.
However, that would mean we are talking about provability in a system which can only prove finitely many things, which in particular cannot contain PA and so Löb’s theorem does not apply.
Hmm. So, a bounded theorem prover using PA can still prove Löb about itself. I think everything is more complicated and you need to make some assumptions (because there’s no guarantee a bounded proof search will find the right Löbian proof to apply to itself, in general), but you can make it go through.
I’ve now edited the post to give the version which I claim works in the empirically uncertain case, and give more hints for how it still goes through in the fully logically uncertain case.
Sorry to necro this here, but I find this topic extremely interesting and I keep coming back to this page to stare at it and tie my brain in knots. Thanks for your notes on how it works in the logically uncertain case. I found a different objection based on the assumption of logical omniscience:
Regarding this you say:
However, this assumes that the Löbian proof exists. We show that the Löbian proof of A=cross→U=−10 exists by showing that the agent can prove □(A=cross→U=−10)→(A=cross→U=−10), and the agent’s proof seems to assume logical omniscience:
If □ here means “provable in PA”, the logic does not follow through if the agent is not logically omniscient: the agent might find crossing to have a higher expected utility regardless, because it may not have seen the proof. If □ here instead means “discoverable by the agent’s proof search” or something to that effect, then the logic here seems to follow through (making the reasonable assumption that if the agent can discover a proof for A=cross->U=-10, then it will set its expected value for crossing to −10). However, that would mean we are talking about provability in a system which can only prove finitely many things, which in particular cannot contain PA and so Löb’s theorem does not apply.
I am still trying to wrap my head around exactly what this means, since your logic seems unassailable in the logically omniscient case. It is counterintuitive to me that the logically omniscient agent would be susceptible to trolling but the more limited one would not. Perhaps there is a clever way for the troll to get around this issue? I dunno. I certainly have no proof that such an agent cannot be trolled in such a way.
(As for this, I think a major goal of LessWrong—and the alignment forum—is to facilitate sustained intellectual progress; a subgoal of that is that discussions can be sustained over long periods of time, rather than flitting about as would be the case if we only had attention for discussing the most recent posts!!)
Right, this is what you have to do.
Hmm. So, a bounded theorem prover using PA can still prove Löb about itself. I think everything is more complicated and you need to make some assumptions (because there’s no guarantee a bounded proof search will find the right Löbian proof to apply to itself, in general), but you can make it go through.
I believe the technical details you’re looking for will be in Critch’s paper on bounded Löb.
Thanks for the link, I will check it out!