But that can’t happen if A’s proof system is sound, therefore A will find only “intended” proofs rather than “spurious” ones in the first place.
If A’s proof system is sound, that directly implies that A won’t generate “spurious” proofs. I don’t see the point of deriving this from the fact that “spurious” proofs lead to contradictions.
It seems like step 2 is just validating that proofs are correct, and when it finds a problem, it doesn’t seem to do anything good about it. What is the point?
If A’s proof system is sound, that directly implies that A won’t generate “spurious” proofs.
Nope. In the example problem from the post, A’s proof system will happily prove statements like “A()==1 implies U()==100”. After all, A() finishes in finite time and returns 2, so any proof system worth its salt will eventually prove that A()==1 is false by simulating A, and a false statement implies anything. The challenge is to not let the agent stumble upon spurious proofs before it returns a value.
To make this post more obvious to people like me and JGWeissman, it would be a good idea to include this in the original post! I came out of this post thinking “Why did I just read that?” and couldn’t figure it out until this response.
Sorry, the omission was due to illusion of transparency on my part. I’d spent so much time thinking about avoiding spurious counterfactuals that I couldn’t imagine other people not seeing that as the most important problem :-)
If the program doesn’t terminate, the proof system may be unable to prove that. But if the program terminates in finitely many steps, any good enough proof system allows for a proof that just simulates all the steps one by one. Of course the program itself won’t reach that particular proof because the proof is too long.
It seems like step 2 is just validating that proofs are correct, and when it finds a problem, it doesn’t seem to do anything good about it.
The activity of step 2 results in proofs of statements like [A()==1 ⇒ U()==100] being long (in agent’s proof system, compared to length of proofs of the “natural” moral arguments like [A()==1 ⇒ U()==5]), something that wouldn’t be the case had step 2 been absent. So it does perform tangible work.
If A’s proof system is sound, that directly implies that A won’t generate “spurious” proofs. I don’t see the point of deriving this from the fact that “spurious” proofs lead to contradictions.
It seems like step 2 is just validating that proofs are correct, and when it finds a problem, it doesn’t seem to do anything good about it. What is the point?
Nope. In the example problem from the post, A’s proof system will happily prove statements like “A()==1 implies U()==100”. After all, A() finishes in finite time and returns 2, so any proof system worth its salt will eventually prove that A()==1 is false by simulating A, and a false statement implies anything. The challenge is to not let the agent stumble upon spurious proofs before it returns a value.
To make this post more obvious to people like me and JGWeissman, it would be a good idea to include this in the original post! I came out of this post thinking “Why did I just read that?” and couldn’t figure it out until this response.
OK, thanks! Included this reasoning in the post.
ahh, good, i did re-read stuff and i was like—did i skip some major piece of it? am i going insane lately?
Now it makes a lot more sense.
Sorry, the omission was due to illusion of transparency on my part. I’d spent so much time thinking about avoiding spurious counterfactuals that I couldn’t imagine other people not seeing that as the most important problem :-)
Not if A includes the proof search! No sound proof system will be able to say anything interesting about programs that use the system itself.
If the program doesn’t terminate, the proof system may be unable to prove that. But if the program terminates in finitely many steps, any good enough proof system allows for a proof that just simulates all the steps one by one. Of course the program itself won’t reach that particular proof because the proof is too long.
The activity of step 2 results in proofs of statements like [A()==1 ⇒ U()==100] being long (in agent’s proof system, compared to length of proofs of the “natural” moral arguments like [A()==1 ⇒ U()==5]), something that wouldn’t be the case had step 2 been absent. So it does perform tangible work.