“Solving” Pascal’s Mugging involves giving an explicit reasoning system and showing that it makes the right decision.
It’s not enough to just say “your confidence has to go down more than their claimed reward goes up”. That part is obvious. The hard part is coming up with actual explicit rules that do that. Particularly ones that don’t fall apart in other situations (e.g. the decision system “always do nothing” can’t be pascal-mugged, but has serious problems).
Another thing not addressed here is that the mugger may be a hypothetical. For example, if the AI generates hypotheses where the universe affects 3^^^^3 people then all decisions will be dominated by these hypotheses because their outcomes outweigh their prior by absurd margins. How do you detect these bad hypotheses? How do you penalize them without excluding them? Should you exclude them?
Please give a more concrete situation with actual numbers and algorithms.
I think you’ll find the argument is clear without any formalization if you recognize that it is NOT the usual claim that confidence goes down. Rather, it’s that the confidence falls below its contrary.
In philH’s terms, you’re engaging in pattern matching rather than taking the argument on its own terms.
How have I not addressed the arguments on its own terms? I agree with basically everything you said, except calling it a solution. You’ll run into non-trivial problems when you try to turn it into an algorithm.
For example, the case of there being an actual physical mugger is meant to be an example of the more general problem of programs with tiny priors predicting super-huge rewards. A strategy based on “probability of the mugger lying” has to be translated to the general case somehow. You have to prevent the AI from mugging itself.
“Solving” Pascal’s Mugging involves giving an explicit reasoning system and showing that it makes the right decision.
It’s not enough to just say “your confidence has to go down more than their claimed reward goes up”. That part is obvious. The hard part is coming up with actual explicit rules that do that. Particularly ones that don’t fall apart in other situations (e.g. the decision system “always do nothing” can’t be pascal-mugged, but has serious problems).
Another thing not addressed here is that the mugger may be a hypothetical. For example, if the AI generates hypotheses where the universe affects 3^^^^3 people then all decisions will be dominated by these hypotheses because their outcomes outweigh their prior by absurd margins. How do you detect these bad hypotheses? How do you penalize them without excluding them? Should you exclude them?
Please give a more concrete situation with actual numbers and algorithms.
I think you’ll find the argument is clear without any formalization if you recognize that it is NOT the usual claim that confidence goes down. Rather, it’s that the confidence falls below its contrary.
In philH’s terms, you’re engaging in pattern matching rather than taking the argument on its own terms.
How have I not addressed the arguments on its own terms? I agree with basically everything you said, except calling it a solution. You’ll run into non-trivial problems when you try to turn it into an algorithm.
For example, the case of there being an actual physical mugger is meant to be an example of the more general problem of programs with tiny priors predicting super-huge rewards. A strategy based on “probability of the mugger lying” has to be translated to the general case somehow. You have to prevent the AI from mugging itself.