I think you’ll find the argument is clear without any formalization if you recognize that it is NOT the usual claim that confidence goes down. Rather, it’s that the confidence falls below its contrary.
In philH’s terms, you’re engaging in pattern matching rather than taking the argument on its own terms.
How have I not addressed the arguments on its own terms? I agree with basically everything you said, except calling it a solution. You’ll run into non-trivial problems when you try to turn it into an algorithm.
For example, the case of there being an actual physical mugger is meant to be an example of the more general problem of programs with tiny priors predicting super-huge rewards. A strategy based on “probability of the mugger lying” has to be translated to the general case somehow. You have to prevent the AI from mugging itself.
I think you’ll find the argument is clear without any formalization if you recognize that it is NOT the usual claim that confidence goes down. Rather, it’s that the confidence falls below its contrary.
In philH’s terms, you’re engaging in pattern matching rather than taking the argument on its own terms.
How have I not addressed the arguments on its own terms? I agree with basically everything you said, except calling it a solution. You’ll run into non-trivial problems when you try to turn it into an algorithm.
For example, the case of there being an actual physical mugger is meant to be an example of the more general problem of programs with tiny priors predicting super-huge rewards. A strategy based on “probability of the mugger lying” has to be translated to the general case somehow. You have to prevent the AI from mugging itself.