The reply to interstice makes me think about logical uncertainty: if the predictor “reasons” about what to expect (internally engages in a sequence of computations which accounts for more structure as it thinks longer), then it is especially difficult to be approximately Bayesian (for all the classic reasons that logical uncertainty address things up). So the argument that the described behaviour isn’t logical doesn’t really apply, because you have to deal with things like you mention where you spot an inconsistency in your probability distribution but you aren’t sure how to deal with it.
This “reasoning” argument is related to the intuition you mention about search—you imagine the system searching for sensible futures when deciding what to predict next. It doesn’t make sense for a system to do that if the system is only learning conditional probabilities of the next token given history; there is no information to gain by looking ahead. However, there are a number of reasons why it could look ahead of it’s doing something more complicated. It could be actively searching for good explanations of its history, and looking ahead to plausible futures might somehow aid that process. Or maybe it learns the more general blank-filling task rather than only the forward-prediction version where you fill in the future given the past; then it could benefit from consulting its own models that go in the other direction as a consistency check.
Still, I’m not convinced that strategic behavior gets incentivised. As you say in the post, we have to think through specific learning algorithms and what behaviour they encourage.
The reply to interstice makes me think about logical uncertainty: if the predictor “reasons” about what to expect (internally engages in a sequence of computations which accounts for more structure as it thinks longer), then it is especially difficult to be approximately Bayesian (for all the classic reasons that logical uncertainty address things up). So the argument that the described behaviour isn’t logical doesn’t really apply, because you have to deal with things like you mention where you spot an inconsistency in your probability distribution but you aren’t sure how to deal with it.
This “reasoning” argument is related to the intuition you mention about search—you imagine the system searching for sensible futures when deciding what to predict next. It doesn’t make sense for a system to do that if the system is only learning conditional probabilities of the next token given history; there is no information to gain by looking ahead. However, there are a number of reasons why it could look ahead of it’s doing something more complicated. It could be actively searching for good explanations of its history, and looking ahead to plausible futures might somehow aid that process. Or maybe it learns the more general blank-filling task rather than only the forward-prediction version where you fill in the future given the past; then it could benefit from consulting its own models that go in the other direction as a consistency check.
Still, I’m not convinced that strategic behavior gets incentivised. As you say in the post, we have to think through specific learning algorithms and what behaviour they encourage.