I haven’t had a chance to read the full paper, but I didn’t find the summary account of why this behavior might be rational particularly compelling.
At a first pass, I think I’d want to judge the behavior of some person (or cognitive system) as “irrational” when the following three constraints are met:
The subject, in some sense, has the basic capability to perform the task competently, and
They do better (by their own values) if they exercise the capability in this task, and
In the task, they fail to exercise this capability.
Even if participants are operating with the strategy “maximize expected answer value”, I’d be willing to judge the participants’ responses as “irrational” if the participants were cognitively competent, understood the concept ’90% confidence interval’, and were incentivized to be calibrated on the task (say, if participants received increased monetary rewards as a function of their calibration).
Pointing out that informativity is important in everyday discourse doesn’t do much to persuade me that the behavior of participants in the study is “rational”, because (to the extent I find the concept of “rationality” useful), I’d use the moniker to label the ability of the system to exercise their capabilities in a way that was conducive to their ends.
I think you make a decent case for claiming that the empirical results outlined don’t straightforwardly imply irrationality, but I’m also not convinced that your theoretical story provides strong grounds for describing participant behaviors as “rational”.
Thanks for the thoughtful reply! Cross-posting the reply I wrote on Substack as well:
I like the objection, and am generally very sympathetic to the “rationality ≈ doing the best you can, given your values/beliefs/constraints” idea, so I see where you’re coming from. I think there are two places I’d push back on in this particular case.
1) To my knowledge, most of these studies don’t use incentive-compatible mechanisms for eliciting intervals. This is something authors of the studies sometimes worry about—Don Moore et al talk about it as a concern in the summary piece I linked to. I think this MAY link to general theoretical difficulties with getting incentive-compatible scoring rules for interval-valued estimates (this is a known problem for imprecise probabilities, eg https://www.cmu.edu/dietrich/philosophy/docs/seidenfeld/Forecasting%20with%20Imprecise%20Probabilities.pdf . I’m not totally sure, but I think it might also apply in this case). The challenge they run into for eliciting particular intervals is that if they reward accuracy, that’ll just incentivize people to widen their intervals. If they reward narrower intervals, great—but how much to incentivize? (Too much, and they’ll narrow their intervals more than they would otherwise.) We could try to reward people for being calibrated OVERALL—so that they get rewarded the closer they are to having 90% of their intervals contain the true value. But the best strategy in response to that is (if you’re giving 10 total intervals) to give 9 trivial intervals (“between 0 and ∞”) that’ll definitely contain the true value, and 1 ridiculous one (“the population of the UK is between 1–2 million”) that definitely won’t.
Maybe there’s another way to incentivize interval-estimation correctly (in which case we should definitely run studies with that method!), but as far as I know this hasn’t been done. So at least in most of the studies that are finding “overprecision”, it’s really not clear that it’s in the participants’ interest to give properly calibrated intervals.
2) Suppose we fix that, and still find that people are overprecise. While I agree that that would be evidence that people are locally being irrational (they’re not best-responding to their situation), there’s still a sense in which the explanation could be *rationalizing*, in the sense of making-sense of why they make this mistake. This is sort of a generic point that it’s hard (and often suboptimal to try!) to fine-tune your behavior to every specific circumstance. If you have a pre-loaded (and largely unconscious) strategy for giving intervals that trades off accuracy and informativity, then it may not be worth the cognitive cost to try to change that to this circumstance because of the (small!) incentives the experimenters are giving you.
An analogy I sometimes use is the Stroop task: you’re told to name the color of the word, not read it, as fast as possible. Of course, when “red” appears in black letters, there’s clearly a mistake made when you say ‘red’, but at the same time we can’t infer from this any broader story about irrationality, since it’s overall good for you to be disposed to automatically and quickly read words when you see them.
Of course, then we get into hard questions about whether it’s suboptimal that in everyday life people automatically do this accuracy/informativity thing, rather than consciously separate out the task of (1) forming a confidence interval/probability, and then (2) forming a guess on that basis. And I agree it’s a challenge for any account on these lines to explain how it could make sense for a cognitive system to smoosh these things together, rather than keeping them consciously separable. We’re actually working on a project along these lines for when cognitive limited agents might be more advantaged by guessing in this way, but I agree it’s a good challenge!
The first point is extremely interesting. I’m just spitballing without having read the literature here, but here’s one quick thought that came to mind. I’m curious to hear what you think.
First, instruct participants to construct a very large number of 90% confidence intervals based on the two-point method.
Then, instruct participants to draw the shape of their 90% confidence interval.
Inform participants that you will take a random sample from these intervals, and tell them they’ll be rewarded based on both: (i) the calibration of their 90% confidence intervals, and (ii) the calibration of the x% confidence intervals implied by their original distribution — where x is unknown to the participants, and will be chosen by the experimenter after inspecting the distributions.
Allow participants to revise their intervals, if they so desire.
So, if participants offered the 90% confidence interval [0, 10^15] on some question, one could back out (say) a 50% or 5% confidence interval from the shape of their initial distribution. Experimenters could then ask participants whether they’re willing to commit to certain implied x% confidence intervals before proceeding.
There might be some clever hack to game this setup, and it’s also a bit too clunky+complicated. But I think there’s probably a version of this which is understandable, and for which attempts to game the system are tricky enough that I doubt strategic behavior would be incentivized in practice.
On the second point, I sort of agree. If people were still overprecise, another way of putting your point might be to say that we have evidence about the irrationality of people’s actions, relative to a given environment. But these experiments might not provide evidence suggesting that participants are irrational characters. I know Kenny Easwaran likes (or at least liked) this distinction in the context of Newomb’s Problem.
That said, I guess my overall thought is that any plausible account of the “rational character” would involve a disposition for agents to fine-tune their cognitive strategies under some circumstances. I can imagine being more convinced by your view if you offered an account of when switching cognitive strategies is desirable, so that we know the circumstances under which it would make sense to call people irrational, even if existing experiments don’t cut it.
I think the issue is that creating an incentive system where people are rewarded for being good at an artificial game that has very little connection to their real world cericumstances, isn’t going to tell us anything very interesting about how rational people are in the real world, under their real constraints.
I have a friend who for a while was very enthused about calibration training, and at one point he even got a group of us from the local meetup + phil hazeldon to do a group exercise using a program he wrote to score our calibration on numeric questions drawn from wikipedia. The thing is that while I learned from this to be way less confident about my guesses—which improves rationality, it is actually, for the reasons specified, useless to create 90% confidence intervals about making important real world decisions.
Should I try training for a new career? The true 90% confidence interval on any difficult to pursue idea that I am seriously considering almost certainly includes ‘you won’t succeed, and the time you spend will be a complete waste’ and ‘you’ll do really well, and it will seem like an awesome decision in retrospect’.
Elon Musk estimated a 10% chance of success for both Tesla and SpaceX. Those might have been good estimates.
Peter Thiel talks about how one of the reasons that the PayPal mafia is so successful is that they all learned that success is possible but really hard. If you pursue a very difficult idea and know you have a 10% chance of success you really have to give it all and know that if you slack you won’t succeed.
I like the strategy, though (from my experience) I do think it might be a big ask for at least online experimental subjects to track what’s going on. But there are also ways in which that’s a virtue—if you just tell them that there are no (good) ways to game the system, they’ll probably mostly trust you and not bother to try to figure it out. So something like that might indeed work! I don’t know exactly what calibration folks have tried in this domain, so will have to dig into it more. But it definitely seems like there should be SOME sensible way (along these lines, or otherwise) of incentivizing giving their true 90% intervals—and a theory like the one we sketched would predict that that should make a difference (or: if it doesn’t, it’s definitely a failure of at least local rationality).
On the second point, I think we’re agreed! I’d definitely like to work out more of a theory for when we should expect rational people to switch from guessing to other forms of estimates. We definitely don’t have that yet, so it’s a good challenge. I’ll take that as motivation for developing that more!
Interesting work, thanks for sharing!
I haven’t had a chance to read the full paper, but I didn’t find the summary account of why this behavior might be rational particularly compelling.
At a first pass, I think I’d want to judge the behavior of some person (or cognitive system) as “irrational” when the following three constraints are met:
The subject, in some sense, has the basic capability to perform the task competently, and
They do better (by their own values) if they exercise the capability in this task, and
In the task, they fail to exercise this capability.
Even if participants are operating with the strategy “maximize expected answer value”, I’d be willing to judge the participants’ responses as “irrational” if the participants were cognitively competent, understood the concept ’90% confidence interval’, and were incentivized to be calibrated on the task (say, if participants received increased monetary rewards as a function of their calibration).
Pointing out that informativity is important in everyday discourse doesn’t do much to persuade me that the behavior of participants in the study is “rational”, because (to the extent I find the concept of “rationality” useful), I’d use the moniker to label the ability of the system to exercise their capabilities in a way that was conducive to their ends.
I think you make a decent case for claiming that the empirical results outlined don’t straightforwardly imply irrationality, but I’m also not convinced that your theoretical story provides strong grounds for describing participant behaviors as “rational”.
Thanks for the thoughtful reply! Cross-posting the reply I wrote on Substack as well:
I like the objection, and am generally very sympathetic to the “rationality ≈ doing the best you can, given your values/beliefs/constraints” idea, so I see where you’re coming from. I think there are two places I’d push back on in this particular case.
1) To my knowledge, most of these studies don’t use incentive-compatible mechanisms for eliciting intervals. This is something authors of the studies sometimes worry about—Don Moore et al talk about it as a concern in the summary piece I linked to. I think this MAY link to general theoretical difficulties with getting incentive-compatible scoring rules for interval-valued estimates (this is a known problem for imprecise probabilities, eg https://www.cmu.edu/dietrich/philosophy/docs/seidenfeld/Forecasting%20with%20Imprecise%20Probabilities.pdf . I’m not totally sure, but I think it might also apply in this case). The challenge they run into for eliciting particular intervals is that if they reward accuracy, that’ll just incentivize people to widen their intervals. If they reward narrower intervals, great—but how much to incentivize? (Too much, and they’ll narrow their intervals more than they would otherwise.) We could try to reward people for being calibrated OVERALL—so that they get rewarded the closer they are to having 90% of their intervals contain the true value. But the best strategy in response to that is (if you’re giving 10 total intervals) to give 9 trivial intervals (“between 0 and ∞”) that’ll definitely contain the true value, and 1 ridiculous one (“the population of the UK is between 1–2 million”) that definitely won’t.
Maybe there’s another way to incentivize interval-estimation correctly (in which case we should definitely run studies with that method!), but as far as I know this hasn’t been done. So at least in most of the studies that are finding “overprecision”, it’s really not clear that it’s in the participants’ interest to give properly calibrated intervals.
2) Suppose we fix that, and still find that people are overprecise. While I agree that that would be evidence that people are locally being irrational (they’re not best-responding to their situation), there’s still a sense in which the explanation could be *rationalizing*, in the sense of making-sense of why they make this mistake. This is sort of a generic point that it’s hard (and often suboptimal to try!) to fine-tune your behavior to every specific circumstance. If you have a pre-loaded (and largely unconscious) strategy for giving intervals that trades off accuracy and informativity, then it may not be worth the cognitive cost to try to change that to this circumstance because of the (small!) incentives the experimenters are giving you.
An analogy I sometimes use is the Stroop task: you’re told to name the color of the word, not read it, as fast as possible. Of course, when “red” appears in black letters, there’s clearly a mistake made when you say ‘red’, but at the same time we can’t infer from this any broader story about irrationality, since it’s overall good for you to be disposed to automatically and quickly read words when you see them.
Of course, then we get into hard questions about whether it’s suboptimal that in everyday life people automatically do this accuracy/informativity thing, rather than consciously separate out the task of (1) forming a confidence interval/probability, and then (2) forming a guess on that basis. And I agree it’s a challenge for any account on these lines to explain how it could make sense for a cognitive system to smoosh these things together, rather than keeping them consciously separable. We’re actually working on a project along these lines for when cognitive limited agents might be more advantaged by guessing in this way, but I agree it’s a good challenge!
What do you think?
The first point is extremely interesting. I’m just spitballing without having read the literature here, but here’s one quick thought that came to mind. I’m curious to hear what you think.
First, instruct participants to construct a very large number of 90% confidence intervals based on the two-point method.
Then, instruct participants to draw the shape of their 90% confidence interval.
Inform participants that you will take a random sample from these intervals, and tell them they’ll be rewarded based on both: (i) the calibration of their 90% confidence intervals, and (ii) the calibration of the x% confidence intervals implied by their original distribution — where x is unknown to the participants, and will be chosen by the experimenter after inspecting the distributions.
Allow participants to revise their intervals, if they so desire.
So, if participants offered the 90% confidence interval [0, 10^15] on some question, one could back out (say) a 50% or 5% confidence interval from the shape of their initial distribution. Experimenters could then ask participants whether they’re willing to commit to certain implied x% confidence intervals before proceeding.
There might be some clever hack to game this setup, and it’s also a bit too clunky+complicated. But I think there’s probably a version of this which is understandable, and for which attempts to game the system are tricky enough that I doubt strategic behavior would be incentivized in practice.
On the second point, I sort of agree. If people were still overprecise, another way of putting your point might be to say that we have evidence about the irrationality of people’s actions, relative to a given environment. But these experiments might not provide evidence suggesting that participants are irrational characters. I know Kenny Easwaran likes (or at least liked) this distinction in the context of Newomb’s Problem.
That said, I guess my overall thought is that any plausible account of the “rational character” would involve a disposition for agents to fine-tune their cognitive strategies under some circumstances. I can imagine being more convinced by your view if you offered an account of when switching cognitive strategies is desirable, so that we know the circumstances under which it would make sense to call people irrational, even if existing experiments don’t cut it.
I think the issue is that creating an incentive system where people are rewarded for being good at an artificial game that has very little connection to their real world cericumstances, isn’t going to tell us anything very interesting about how rational people are in the real world, under their real constraints.
I have a friend who for a while was very enthused about calibration training, and at one point he even got a group of us from the local meetup + phil hazeldon to do a group exercise using a program he wrote to score our calibration on numeric questions drawn from wikipedia. The thing is that while I learned from this to be way less confident about my guesses—which improves rationality, it is actually, for the reasons specified, useless to create 90% confidence intervals about making important real world decisions.
Should I try training for a new career? The true 90% confidence interval on any difficult to pursue idea that I am seriously considering almost certainly includes ‘you won’t succeed, and the time you spend will be a complete waste’ and ‘you’ll do really well, and it will seem like an awesome decision in retrospect’.
Elon Musk estimated a 10% chance of success for both Tesla and SpaceX. Those might have been good estimates.
Peter Thiel talks about how one of the reasons that the PayPal mafia is so successful is that they all learned that success is possible but really hard. If you pursue a very difficult idea and know you have a 10% chance of success you really have to give it all and know that if you slack you won’t succeed.
Crossposting from Substack:
Super interesting!
I like the strategy, though (from my experience) I do think it might be a big ask for at least online experimental subjects to track what’s going on. But there are also ways in which that’s a virtue—if you just tell them that there are no (good) ways to game the system, they’ll probably mostly trust you and not bother to try to figure it out. So something like that might indeed work! I don’t know exactly what calibration folks have tried in this domain, so will have to dig into it more. But it definitely seems like there should be SOME sensible way (along these lines, or otherwise) of incentivizing giving their true 90% intervals—and a theory like the one we sketched would predict that that should make a difference (or: if it doesn’t, it’s definitely a failure of at least local rationality).
On the second point, I think we’re agreed! I’d definitely like to work out more of a theory for when we should expect rational people to switch from guessing to other forms of estimates. We definitely don’t have that yet, so it’s a good challenge. I’ll take that as motivation for developing that more!