The starting context here is the problem of what Paul calls sluggish updating. Bob is asked to predict the probability of a recession this summer. He said 75% in January, and how believes 50% in February. What to do? Paul sees Bob as thinking roughly this:
If I stick to my guns with 75%, then I still have a 50-50 chance of looking smarter than Alice when a recession occurs. If I waffle and say 50%, then I won’t get any credit even if my initial prediction was good. Of course if I stick with 75% now and only go down to 50% later then I’ll get dinged for making a bad prediction right now—but that’s little worse than what people will think of me immediately if I waffle.
Paul concludes that this is likely:
Bob’s optimal strategy depends on exactly how people are evaluating him. If they care exclusively about evaluating his performance in January then he should always stick with his original guess of 75%. If they care exclusively about evaluating his performance in February then he should go straight to 50%. In the more realistic case where they care about both, his optimal strategy is somewhere in between. He might update to 70% this week.
This results in a pattern of “sluggish” updating in a predictable direction: once I see Bob adjust his probability from 75% down to 70%, I expect that his “real” estimate is lower still. In expectation, his probability is going to keep going down in subsequent months. (Though it’s not a sure thing—the whole point of Bob’s behavior is to hold out hope that his original estimate will turn out to be reasonable and he can save face.)
This isn’t ‘sluggish’ updating, of the type we talk about when we discuss the Aumann Agreement Theorem and its claim that rational parties can’t agree to disagree. It’s dishonest update reporting. As Paul says, explicitly.
I think this kind of sluggish updating is quite common—if I see Bob assign 70% probability to something and Alice assign 50% probability, I expect their probabilities to gradually inch towards one another rather than making a big jump. (If Alice and Bob were epistemically rational and honest, their probabilities would immediately take big enough jumps that we wouldn’t be able to predict in advance who will end up with the higher number. Needless to say, this is not what happens!)
Unfortunately, I think that sluggish updating isn’t even the worst case for humans. It’s quite common for Bob to double down with his 75%, only changing his mind at the last defensible moment. This is less easily noticed, but is even more epistemically costly.
When Paul speaks of Bob’s ‘optimal strategy’ he does not include a cost to lying, or a cost to others getting inaccurate information.
This is a world where all one cares about is how one is evaluated, and lying and deceiving others is free as long as you’re not caught. You’ll get exactly what you incentivize.
What that definitely won’t get you are a lot more than just accurate probability estimates.
The only way to get accurate probability estimates from Bob-who-is-happy-to-strategically-lie is to use a mathematical formula to reward Bob based on his log likelihood score. Or to have Bob bet in a prediction market, or another similar robust method. And then use that as the entirety of how one evaluates Bob. Ifhuman judgment is allowedin the process, the value of that will overwhelm any desire on Bob’s part to be precise or properly update.
Since Bob is almost certainly in a human context where humans are evaluating him based on human judgments, that means all is mostly lost.
As Paul notes, consistency is crucial in how one is evaluated. Even bigger is avoiding mistakes.
Given the asymmetric justice of punishing mistakes and inconsistency that can be proven and identified, the strategic actor must seek cognitive privacy. The more others know about the path of your beliefs, the easier it will be for them to spot an inconsistency or a mistake. It’s hard enough to give a reasonable answer once, but updating in a way that never can be shown to have ever made a mistake or been inconstant? Impossible.
A mistake or inconsistency are the bad things one must avoid getting docked points for.
Thus, Bob’s full strategy, in addition to choosing probabilities that sound best and give the best cost/benefit payoffs in human intuitive evaluations of performance, is to avoid making any clear statements of any kind. When he must do so, he will do his best to be able to deny having done so. Bob will seek to destroy the historical record of his predictions and statements, and their path. And also prevent the creation of any common knowledge, at all. Any knowledge of the past situation, or the present outcome, could be shown to not be consistent with what Bob said, or what we believe Bob said, or what we think Bob implied. And so on.
Bob’s optimal strategy is full anti-epistemology. He is opposed to knowledge.
In that context, Paul’s suggested solutions seem highly unlikely to work.
His first suggestion is to exclude information – to judge Bob only by the aggregation of all of Bob’s predictions, and ignore any changes. Not only does this throw away vital information, it also isn’t realistic. Even if it was realistic for some people, others would still punish Bob for updating.
Paul’s second suggestion is to make predictions about others’ belief changes, which he himself notes ‘literally wouldn’t work.’ And that it is ‘a recipe for epistemic catastrophe.’ The whole thing is convoluted and unnatural at best.
Paul’s third and final suggestion is social disapproval of sluggish updating. As he notes, this twists social incentives potentially in good ways but likely in ways that make things worse:
Having noticed that sluggish updating is a thing, it’s tempting to respond by just penalizing people when they seem to update sluggishly. I think that’s a problematic response:
I think the rational reaction to norms against sluggish updating may often be no updating at all, which is much worse.
In general combating non-epistemic incentives with other non-epistemic incentives seems like digging yourself into a hole, and can only work if you balance everything perfectly. It feels much safer to just try to remove the non-epistemic incentives that were causing the problem in the first place.
Sluggish updating isn’t easy to detect in any given case. For example, suppose that Bob expects an event to happen, and if it does he expects to get a positive sign on any given day with 1% probability. Then if the event doesn’t happen his probability will decay exponentially towards zero, falling in half every ~70 days. This will look like sluggish updating.
Bob already isn’t excited about updating. He’d prefer to not update at all. He’s upset about having had to give that 75% answer, because now if there’s new information (including others’ opinions) he can’t keep saying ‘probably’ and has to give a new number, again giving others information to use as ammunition against him.
The reason he updated visibly, at all, was that not updating would have been inconsistent or otherwise punished. Punish updates for being too small on top of already looking bad for changing at all, and the chance you get the incentives right here are almost zero. Bob will game the system, one way or another. And now, you won’t know how Bob is doing it. Before, you could know that Bob moving from 75% to 70% meant going to something lower, perhaps 50%. Predictable bad calibration is much easier to fix. Twist things into knots and there’s no way to tell.
Meanwhile, Bob is going to reliably get evaluated as smarter and more capable than Alice, who for reasons of principle is going around reporting her probability estimates accurately. Those observing might even punish Alice further, as someone who does not know how the game is played, and would be a poor ally.
The best we can do, under such circumstances, if we want insight from Bob, is to do our best to make Bob believe we will reward him for updating correctly and reporting that update honestly, then consider Bob’s incentives, biases and instincts, and attempt as best we can to back out what Bob actually believes.
As Paul notes, we can try to combat non-epistemic incentives with equal and opposite other non-epistemic incentives, but going deep on that generally only makes things more complex and rewards more attention to our procedures and how to trick us, giving Bob an even bigger advantage over Alice.
A last-ditch effort would be to give Bob sufficient skin in the game. If Bob directly benefits enough from us having accurate models, Bob might report more accurately. But outside of very small groups, there isn’t enough skin in the game to go around. And that still assumes Bob thinks the way for the group to succeed is to be honest and create accurate maps. Whereas most people like Bob do not think that is how winners behave. Certainly not with vague things that don’t have direct physical consequences, like probability estimates.
What can be done about this?
Unless we care enough, very little. We lost early. We lost on the meta level. We didn’t Play in Hard Mode.
We accepted that Bob was optimizing for how Bob was evaluated, rather than Bob optimizing for accuracy. But we didn’t evaluate Bob on that basis. We didn’t place the virtues of honesty and truth-seeking above the virtue of looking good sufficiently to make Bob’s ‘look good’ procedure evolve into ‘be honest and seek truth.’ We didn’t work to instill epistemic virtues in Bob, or select for Bobs with or seeking those virtues.
We didn’t reform the local culture.
And we didn’t fire Bob the moment we noticed.
Game over.
I once worked for a financial firm that made this priority clear. On the very first day. You need to always be ready to explain and work to improve your reasoning. If we catch you lying, about anything at all, ever, including a probability estimate, that’s it. You’re fired. Period.
It didn’t solve all our problems. More subtle distortionary dynamics remained, and some evolved as reactions to the local virtues, as they always do. For these and other reasons, that I will not be getting into here or in the comments, it ended up not being a good place for me. Those topics are for another day.
But they sure as hell didn’t have to worry about the likes of Bob.
Dishonest Update Reporting
Link post
Related to: Asymmetric Justice, Privacy, Blackmail
Previously (Paul Christiano): Epistemic Incentives and Sluggish Updating
The starting context here is the problem of what Paul calls sluggish updating. Bob is asked to predict the probability of a recession this summer. He said 75% in January, and how believes 50% in February. What to do? Paul sees Bob as thinking roughly this:
Paul concludes that this is likely:
This isn’t ‘sluggish’ updating, of the type we talk about when we discuss the Aumann Agreement Theorem and its claim that rational parties can’t agree to disagree. It’s dishonest update reporting. As Paul says, explicitly.
When Paul speaks of Bob’s ‘optimal strategy’ he does not include a cost to lying, or a cost to others getting inaccurate information.
This is a world where all one cares about is how one is evaluated, and lying and deceiving others is free as long as you’re not caught. You’ll get exactly what you incentivize.
What that definitely won’t get you are a lot more than just accurate probability estimates.
The only way to get accurate probability estimates from Bob-who-is-happy-to-strategically-lie is to use a mathematical formula to reward Bob based on his log likelihood score. Or to have Bob bet in a prediction market, or another similar robust method. And then use that as the entirety of how one evaluates Bob. If human judgment is allowed in the process, the value of that will overwhelm any desire on Bob’s part to be precise or properly update.
Since Bob is almost certainly in a human context where humans are evaluating him based on human judgments, that means all is mostly lost.
As Paul notes, consistency is crucial in how one is evaluated. Even bigger is avoiding mistakes.
Given the asymmetric justice of punishing mistakes and inconsistency that can be proven and identified, the strategic actor must seek cognitive privacy. The more others know about the path of your beliefs, the easier it will be for them to spot an inconsistency or a mistake. It’s hard enough to give a reasonable answer once, but updating in a way that never can be shown to have ever made a mistake or been inconstant? Impossible.
A mistake or inconsistency are the bad things one must avoid getting docked points for.
Thus, Bob’s full strategy, in addition to choosing probabilities that sound best and give the best cost/benefit payoffs in human intuitive evaluations of performance, is to avoid making any clear statements of any kind. When he must do so, he will do his best to be able to deny having done so. Bob will seek to destroy the historical record of his predictions and statements, and their path. And also prevent the creation of any common knowledge, at all. Any knowledge of the past situation, or the present outcome, could be shown to not be consistent with what Bob said, or what we believe Bob said, or what we think Bob implied. And so on.
Bob’s optimal strategy is full anti-epistemology. He is opposed to knowledge.
In that context, Paul’s suggested solutions seem highly unlikely to work.
His first suggestion is to exclude information – to judge Bob only by the aggregation of all of Bob’s predictions, and ignore any changes. Not only does this throw away vital information, it also isn’t realistic. Even if it was realistic for some people, others would still punish Bob for updating.
Paul’s second suggestion is to make predictions about others’ belief changes, which he himself notes ‘literally wouldn’t work.’ And that it is ‘a recipe for epistemic catastrophe.’ The whole thing is convoluted and unnatural at best.
Paul’s third and final suggestion is social disapproval of sluggish updating. As he notes, this twists social incentives potentially in good ways but likely in ways that make things worse:
Bob already isn’t excited about updating. He’d prefer to not update at all. He’s upset about having had to give that 75% answer, because now if there’s new information (including others’ opinions) he can’t keep saying ‘probably’ and has to give a new number, again giving others information to use as ammunition against him.
The reason he updated visibly, at all, was that not updating would have been inconsistent or otherwise punished. Punish updates for being too small on top of already looking bad for changing at all, and the chance you get the incentives right here are almost zero. Bob will game the system, one way or another. And now, you won’t know how Bob is doing it. Before, you could know that Bob moving from 75% to 70% meant going to something lower, perhaps 50%. Predictable bad calibration is much easier to fix. Twist things into knots and there’s no way to tell.
Meanwhile, Bob is going to reliably get evaluated as smarter and more capable than Alice, who for reasons of principle is going around reporting her probability estimates accurately. Those observing might even punish Alice further, as someone who does not know how the game is played, and would be a poor ally.
The best we can do, under such circumstances, if we want insight from Bob, is to do our best to make Bob believe we will reward him for updating correctly and reporting that update honestly, then consider Bob’s incentives, biases and instincts, and attempt as best we can to back out what Bob actually believes.
As Paul notes, we can try to combat non-epistemic incentives with equal and opposite other non-epistemic incentives, but going deep on that generally only makes things more complex and rewards more attention to our procedures and how to trick us, giving Bob an even bigger advantage over Alice.
A last-ditch effort would be to give Bob sufficient skin in the game. If Bob directly benefits enough from us having accurate models, Bob might report more accurately. But outside of very small groups, there isn’t enough skin in the game to go around. And that still assumes Bob thinks the way for the group to succeed is to be honest and create accurate maps. Whereas most people like Bob do not think that is how winners behave. Certainly not with vague things that don’t have direct physical consequences, like probability estimates.
What can be done about this?
Unless we care enough, very little. We lost early. We lost on the meta level. We didn’t Play in Hard Mode.
We accepted that Bob was optimizing for how Bob was evaluated, rather than Bob optimizing for accuracy. But we didn’t evaluate Bob on that basis. We didn’t place the virtues of honesty and truth-seeking above the virtue of looking good sufficiently to make Bob’s ‘look good’ procedure evolve into ‘be honest and seek truth.’ We didn’t work to instill epistemic virtues in Bob, or select for Bobs with or seeking those virtues.
We didn’t reform the local culture.
And we didn’t fire Bob the moment we noticed.
Game over.
I once worked for a financial firm that made this priority clear. On the very first day. You need to always be ready to explain and work to improve your reasoning. If we catch you lying, about anything at all, ever, including a probability estimate, that’s it. You’re fired. Period.
It didn’t solve all our problems. More subtle distortionary dynamics remained, and some evolved as reactions to the local virtues, as they always do. For these and other reasons, that I will not be getting into here or in the comments, it ended up not being a good place for me. Those topics are for another day.
But they sure as hell didn’t have to worry about the likes of Bob.