Suppose your world model says you take the 5 dollar bill with 100% probability. Now conditioning on taking the 10 dollar bill gives you complete garbage, since you are conditioning on a probability 0 event.
You can make the agent take the 10 dollar bill in that case, in effect blackmailing the world model: “if you want to stay sound, don’t prove that action A has probability 0”. I’d love to know if that covers all cases. In other words, if there’s an N symbol proof of a “garbage” conditional, must there be an M<N symbol proof of the form “action A has probability 0”?
The agents in this post aren’t proof-based. Proof-based issues have some issues with weird counterfactuals.
Perhaps the only worlds where you take some specific action are ones where PA is inconsistent. (COEDT also has issues since the nearby oracles are not reflective, but it’s a different set of issues)
In general queries to reflective-oracle-like world models that have forms like “is the probability of this exactly 0?” are problematic, since they are vulnerable to liar’s paradoxes. What if you take action A iff the probability of taking action A is exactly 0? So to the extent that this works for proof-based agents, it’s because they’re not complete in some sense.
I am not sure that if there is a proof of a garbage conditional then there will be a similarly-short proof that the conditional is garbage. Say “proving a garbage conditional P(A|B)” means proving P(A^B)=cP(B) for some c such that in fact P(B)=0. I could imagine a case where it is easy to prove the event A equivalent to the event B, but hard to prove P(B)=0. Then you could prove P(A^B)=P(B) easily but not P(B)=0. (This case might not be problematic from a decision theory perspective, which is interesting, but it’s still an invalid conditional)
Yeah, my comment was more about proof-based agents rather than oracle-based, sorry about going off topic. In a proof-based setting, conditionals with P(B)=0 aren’t necessarily garbage, some of them are intended and we want the agent to find them. Your example might be one of those. The hard part is defining which is which.
You can make the agent take the 10 dollar bill in that case, in effect blackmailing the world model: “if you want to stay sound, don’t prove that action A has probability 0”. I’d love to know if that covers all cases. In other words, if there’s an N symbol proof of a “garbage” conditional, must there be an M<N symbol proof of the form “action A has probability 0”?
The agents in this post aren’t proof-based. Proof-based issues have some issues with weird counterfactuals. Perhaps the only worlds where you take some specific action are ones where PA is inconsistent. (COEDT also has issues since the nearby oracles are not reflective, but it’s a different set of issues)
In general queries to reflective-oracle-like world models that have forms like “is the probability of this exactly 0?” are problematic, since they are vulnerable to liar’s paradoxes. What if you take action A iff the probability of taking action A is exactly 0? So to the extent that this works for proof-based agents, it’s because they’re not complete in some sense.
I am not sure that if there is a proof of a garbage conditional then there will be a similarly-short proof that the conditional is garbage. Say “proving a garbage conditional P(A|B)” means proving P(A^B)=cP(B) for some c such that in fact P(B)=0. I could imagine a case where it is easy to prove the event A equivalent to the event B, but hard to prove P(B)=0. Then you could prove P(A^B)=P(B) easily but not P(B)=0. (This case might not be problematic from a decision theory perspective, which is interesting, but it’s still an invalid conditional)
Yeah, my comment was more about proof-based agents rather than oracle-based, sorry about going off topic. In a proof-based setting, conditionals with P(B)=0 aren’t necessarily garbage, some of them are intended and we want the agent to find them. Your example might be one of those. The hard part is defining which is which.