My (possibly very incorrect) takeaway from the post, as someone with very little background in mathematical logic, was that “If I can prove x has higher utility than y, then I will do x” (statement 1) is a bad heuristic for an EDT agent that can reason about its own source code, because outputting x will be a fixed point of this decision process* even when this does not return higher utility. Specifically, an EDT agent will choose action x iff the utility of choosing x is higher than that of choosing y (assuming the utilities are different). Thus, assuming statement 1 is equivalent to assuming “if I can prove x has higher utility than y, x has higher utility than y” (statement 2) for the EDT agent. Because assuming statement 2 leads to absurd conclusions (like the agent taking the 5 dollar bill), assuming it is a bad heuristic.
This use of Lob’s theorem seems to do exactly what you want: show that we can’t prove a statement is unprovable. If we prove a statement of the form “if a is provable then a is true” , then the contrapositive “if a is not true then it is not provable” follows. However, I thought the point of the post is that we can’t actually prove a statement of this form, namely the statement “if x does not have higher utility than y, then I cannot prove that x has higher utility than y” (statement 3). Statement 3 is necessary for the heuristic in statement 1 to be useful, but the post shows that it is in fact false.
The point of the post isn’t to prove something false, it’s to show that we can’t prove a statement is unprovable.
*I’m not sure if I’m these terms correctly and precisely :/
My (possibly very incorrect) takeaway from the post, as someone with very little background in mathematical logic, was that “If I can prove x has higher utility than y, then I will do x” (statement 1) is a bad heuristic for an EDT agent that can reason about its own source code, because outputting x will be a fixed point of this decision process* even when this does not return higher utility. Specifically, an EDT agent will choose action x iff the utility of choosing x is higher than that of choosing y (assuming the utilities are different). Thus, assuming statement 1 is equivalent to assuming “if I can prove x has higher utility than y, x has higher utility than y” (statement 2) for the EDT agent. Because assuming statement 2 leads to absurd conclusions (like the agent taking the 5 dollar bill), assuming it is a bad heuristic.
This use of Lob’s theorem seems to do exactly what you want: show that we can’t prove a statement is unprovable. If we prove a statement of the form “if a is provable then a is true” , then the contrapositive “if a is not true then it is not provable” follows. However, I thought the point of the post is that we can’t actually prove a statement of this form, namely the statement “if x does not have higher utility than y, then I cannot prove that x has higher utility than y” (statement 3). Statement 3 is necessary for the heuristic in statement 1 to be useful, but the post shows that it is in fact false.
The point of the post isn’t to prove something false, it’s to show that we can’t prove a statement is unprovable.
*I’m not sure if I’m these terms correctly and precisely :/