I like your example, because “Carol’s answers are correct” seems like something very simple, and also impossible for a (bounded) Bayesian to represent. It’s a variation of calculator or notepad problems—that is, the problem of trying to represent a reasoner who has (and needs) computational/informational resources which are outside of their mind. (Calculator/notepad problems aren’t something I’ve written about anywhere iirc, just something that’s sometimes on my mind when thinking about logical uncertainty.)
I do want to note that weakening honesty seems like a pretty radical departure from the standard Bayesian treatment of filtered evidence, in any case (for better or worse!). Distinguishing between observing X and X itself, it is normally assumed that observing X implies X. So while our thinking on this does seem to differ, we are agreeing that there are significant points against the standard view.
From outside, the solution you propose looks like “doing the best you can to represent the honesty hypothesis in a computationally tractable way”—but from inside, the agent doesn’t think of it that way. It simply can’t conceive of perfect honesty. This kind of thing feels both philosophically unsatisfying and potentially concerning for alignment. It would be more satisfying if the agent could explicitly suspect perfect honesty, but also use tractable approximations to reason about it. (Of course, one cannot always get everything one wants.)
We could modify the scenario to also include questions about Carol’s honesty—perhaps when the pseudo-Bayesian gets a question wrong, it is asked to place a conditional bet about what Carol would say if Carol eventually gets around to speaking on that question. Or other variations along similar lines.
Here’s another perspective. Suppose that now Bob and Carol have symmetrical roles: each one asks a question, allows Alice to answer, and then reveals the right answer. Alice gets a reward when ey answer correctly. We can now see that perfect honesty actually is tractable. It corresponds to an incomplete hypothesis. If Alice learns this hypothesis, ey answer correctly any question ey already heard before (no matter who asks now and who asked before). We can also consider a different incomplete hypothesis that allows real-time simulation of Carol. If Alice learns this hypothesis, ey answer correctly any question asked by Carol. However, the conjunction of both hypotheses is already intractable. There’s no impediment for Alice to learn both hypotheses: ey can both memorize previous answers and answer all questions by Carol. But, this doesn’t automatically imply learning the conjunction.
I like your example, because “Carol’s answers are correct” seems like something very simple, and also impossible for a (bounded) Bayesian to represent. It’s a variation of calculator or notepad problems—that is, the problem of trying to represent a reasoner who has (and needs) computational/informational resources which are outside of their mind. (Calculator/notepad problems aren’t something I’ve written about anywhere iirc, just something that’s sometimes on my mind when thinking about logical uncertainty.)
I do want to note that weakening honesty seems like a pretty radical departure from the standard Bayesian treatment of filtered evidence, in any case (for better or worse!). Distinguishing between observing X and X itself, it is normally assumed that observing X implies X. So while our thinking on this does seem to differ, we are agreeing that there are significant points against the standard view.
From outside, the solution you propose looks like “doing the best you can to represent the honesty hypothesis in a computationally tractable way”—but from inside, the agent doesn’t think of it that way. It simply can’t conceive of perfect honesty. This kind of thing feels both philosophically unsatisfying and potentially concerning for alignment. It would be more satisfying if the agent could explicitly suspect perfect honesty, but also use tractable approximations to reason about it. (Of course, one cannot always get everything one wants.)
We could modify the scenario to also include questions about Carol’s honesty—perhaps when the pseudo-Bayesian gets a question wrong, it is asked to place a conditional bet about what Carol would say if Carol eventually gets around to speaking on that question. Or other variations along similar lines.
Here’s another perspective. Suppose that now Bob and Carol have symmetrical roles: each one asks a question, allows Alice to answer, and then reveals the right answer. Alice gets a reward when ey answer correctly. We can now see that perfect honesty actually is tractable. It corresponds to an incomplete hypothesis. If Alice learns this hypothesis, ey answer correctly any question ey already heard before (no matter who asks now and who asked before). We can also consider a different incomplete hypothesis that allows real-time simulation of Carol. If Alice learns this hypothesis, ey answer correctly any question asked by Carol. However, the conjunction of both hypotheses is already intractable. There’s no impediment for Alice to learn both hypotheses: ey can both memorize previous answers and answer all questions by Carol. But, this doesn’t automatically imply learning the conjunction.
It’s absurd (in a good way) how much you are getting out of incomplete hypotheses. :)