From my perspective, the trouble here comes from the honesty condition. This condition hides an unbounded quantifier: “if the speaker will ever say something, then it is true”. So it’s no surprise we run into computational complexity and even computability issues.
Consider the following setting. The agent Alice repeatedly interacts with two other entities: Bob and Carol. When Alice interacts with Bob, Bob asks Alice a yes/no question, Alice answers it and receives either +1 or −1 reward depending on whether the answer is correct. When Alice interacts with Carol, Carol tells Alice some question and the answer to that question.
Suppose that Alice starts with some low-information prior and learns over time about Bob and Carol both. The honesty condition becomes “if Carol will ever say (X,Y) and Bob asks the question X, then the correct answer is Y”. But, this condition might be computationally intractable so it is not in the prior and cannot be learned. However, weaker versions of this condition might be tractable, for example “if Carol says (X,Y) at time step between 0 and t+1000, and Bob asks X at time t, then the correct answer is Y”. Since simulating Bob is still intractable, this condition cannot be expressed as a vanilla Bayesian hypothesis. However, it can be expressed as an incomplete hypothesis. We can also have an incomplete hypothesis that is the conjunction of this weak honesty condition with a full simulation of Carol. Once Alice learned this incomplete hypothesis, ey answer correctly at least those questions which Carol have already taught em or will teach em within 1000 time steps.
I like your example, because “Carol’s answers are correct” seems like something very simple, and also impossible for a (bounded) Bayesian to represent. It’s a variation of calculator or notepad problems—that is, the problem of trying to represent a reasoner who has (and needs) computational/informational resources which are outside of their mind. (Calculator/notepad problems aren’t something I’ve written about anywhere iirc, just something that’s sometimes on my mind when thinking about logical uncertainty.)
I do want to note that weakening honesty seems like a pretty radical departure from the standard Bayesian treatment of filtered evidence, in any case (for better or worse!). Distinguishing between observing X and X itself, it is normally assumed that observing X implies X. So while our thinking on this does seem to differ, we are agreeing that there are significant points against the standard view.
From outside, the solution you propose looks like “doing the best you can to represent the honesty hypothesis in a computationally tractable way”—but from inside, the agent doesn’t think of it that way. It simply can’t conceive of perfect honesty. This kind of thing feels both philosophically unsatisfying and potentially concerning for alignment. It would be more satisfying if the agent could explicitly suspect perfect honesty, but also use tractable approximations to reason about it. (Of course, one cannot always get everything one wants.)
We could modify the scenario to also include questions about Carol’s honesty—perhaps when the pseudo-Bayesian gets a question wrong, it is asked to place a conditional bet about what Carol would say if Carol eventually gets around to speaking on that question. Or other variations along similar lines.
Here’s another perspective. Suppose that now Bob and Carol have symmetrical roles: each one asks a question, allows Alice to answer, and then reveals the right answer. Alice gets a reward when ey answer correctly. We can now see that perfect honesty actually is tractable. It corresponds to an incomplete hypothesis. If Alice learns this hypothesis, ey answer correctly any question ey already heard before (no matter who asks now and who asked before). We can also consider a different incomplete hypothesis that allows real-time simulation of Carol. If Alice learns this hypothesis, ey answer correctly any question asked by Carol. However, the conjunction of both hypotheses is already intractable. There’s no impediment for Alice to learn both hypotheses: ey can both memorize previous answers and answer all questions by Carol. But, this doesn’t automatically imply learning the conjunction.
From my perspective, the trouble here comes from the honesty condition. This condition hides an unbounded quantifier: “if the speaker will ever say something, then it is true”. So it’s no surprise we run into computational complexity and even computability issues.
Consider the following setting. The agent Alice repeatedly interacts with two other entities: Bob and Carol. When Alice interacts with Bob, Bob asks Alice a yes/no question, Alice answers it and receives either +1 or −1 reward depending on whether the answer is correct. When Alice interacts with Carol, Carol tells Alice some question and the answer to that question.
Suppose that Alice starts with some low-information prior and learns over time about Bob and Carol both. The honesty condition becomes “if Carol will ever say (X,Y) and Bob asks the question X, then the correct answer is Y”. But, this condition might be computationally intractable so it is not in the prior and cannot be learned. However, weaker versions of this condition might be tractable, for example “if Carol says (X,Y) at time step between 0 and t+1000, and Bob asks X at time t, then the correct answer is Y”. Since simulating Bob is still intractable, this condition cannot be expressed as a vanilla Bayesian hypothesis. However, it can be expressed as an incomplete hypothesis. We can also have an incomplete hypothesis that is the conjunction of this weak honesty condition with a full simulation of Carol. Once Alice learned this incomplete hypothesis, ey answer correctly at least those questions which Carol have already taught em or will teach em within 1000 time steps.
I like your example, because “Carol’s answers are correct” seems like something very simple, and also impossible for a (bounded) Bayesian to represent. It’s a variation of calculator or notepad problems—that is, the problem of trying to represent a reasoner who has (and needs) computational/informational resources which are outside of their mind. (Calculator/notepad problems aren’t something I’ve written about anywhere iirc, just something that’s sometimes on my mind when thinking about logical uncertainty.)
I do want to note that weakening honesty seems like a pretty radical departure from the standard Bayesian treatment of filtered evidence, in any case (for better or worse!). Distinguishing between observing X and X itself, it is normally assumed that observing X implies X. So while our thinking on this does seem to differ, we are agreeing that there are significant points against the standard view.
From outside, the solution you propose looks like “doing the best you can to represent the honesty hypothesis in a computationally tractable way”—but from inside, the agent doesn’t think of it that way. It simply can’t conceive of perfect honesty. This kind of thing feels both philosophically unsatisfying and potentially concerning for alignment. It would be more satisfying if the agent could explicitly suspect perfect honesty, but also use tractable approximations to reason about it. (Of course, one cannot always get everything one wants.)
We could modify the scenario to also include questions about Carol’s honesty—perhaps when the pseudo-Bayesian gets a question wrong, it is asked to place a conditional bet about what Carol would say if Carol eventually gets around to speaking on that question. Or other variations along similar lines.
Here’s another perspective. Suppose that now Bob and Carol have symmetrical roles: each one asks a question, allows Alice to answer, and then reveals the right answer. Alice gets a reward when ey answer correctly. We can now see that perfect honesty actually is tractable. It corresponds to an incomplete hypothesis. If Alice learns this hypothesis, ey answer correctly any question ey already heard before (no matter who asks now and who asked before). We can also consider a different incomplete hypothesis that allows real-time simulation of Carol. If Alice learns this hypothesis, ey answer correctly any question asked by Carol. However, the conjunction of both hypotheses is already intractable. There’s no impediment for Alice to learn both hypotheses: ey can both memorize previous answers and answer all questions by Carol. But, this doesn’t automatically imply learning the conjunction.
It’s absurd (in a good way) how much you are getting out of incomplete hypotheses. :)