Agreed that people have lots of goals that don’t fit in this model. It’s definitely a simplified model. But I’d argue that ONE of (most) people’s goals to solve problems; and I do think, broadly speaking, it is an important function (evolutionarily and currently) for conversation. So I still think this model gets at an interesting dynamic.
Kevin Dorst
I think it depends on what we mean by assuming the truth is in the center of the spectrum. In the model at the end, we assume is at the extreme left of the initial distribution—i.e. µ=40, while everyone’s estimates are higher than 40. Even then, we end up with a spread where those who end up in the middle (ish—not exactly the middle) are both more accurate and less biased.
What we do need is that wherever the truth is, people will end up being on either side of it. Obviously in some cases that won’t hold. But in many cases it will—it’s basically inevitable if people’s estimates are subject to noise and people’s priors aren’t in the completely wrong region of logical space.
Hm, I’m not following your definitions of P and Q. Note that there’s no (that I know of) easy closed-form expression for the likelihoods of various sequences for these chains; I had to calculate them using dynamic programming on the Markov chains.
The relevant effect driving it is that the degree of shiftiness (how far it deviates from 50%-heads rate) builds up over a streak, so although in any given case where Switchy and Sticky deviate (say there’s a streak of 2, and Switchy has a 30% of continuing while Sticky has a 70% chance), they have the same degree of divergence, Switchy makes it less likely that you’ll run into these long streaks of divergences while Sticky makes it extremely likely. Neither Switchy nor Sticky gives a constant rate of switching; it depends on the streak length. (Compare a hypergeometric distribution.)
Take a look at §4 of the paper and the “Limited data (full sequence): asymmetric closeness and convergence” section of the Mathematica Notebook linked from the paper to see how I calculated their KL divergences. Let me know what you think!
See the discussion in §6 of the paper. There are too many variations to run, but it at least shows that the result doesn’t depend on knowing the long-run frequency is 50%; if we’re uncertain about both the long-run hit rate and about the degree of shiftiness (or whether it’s shifty at all), the results still hold.
Does that help?
Mathematica notebook is here! Link in the full paper.
How did you define Switchy and Sticky? It needs to be >= 2-steps, i.e. the following matrices won’t exhibit the effect. So it won’t appear if they are eg
Switchy = (0.4, 0.6; 0.6, 0.4)
Sticky = (0.6,0.4; 0.4,0.6)
But it WILL appear if they build up to (say) 60%-shiftiness over two steps. Eg:
Switchy = (0.4, 0 ,0.6, 0; 0.45, 0, 0.55, 0; 0, 0.55, 0, 0.45, 0, 0.6, 0, 0.4)
Sticky = (0.6, 0 ,0.4, 0; 0.55, 0, 0.45, 0; 0, 0.45, 0, 0.55, 0, 0.4, 0, 0.6)
Would it have helped if I added the attached paragraphs (in the paper, page 3, cut for brevity)?
Frame the conclusion as a disjunction: “either we construe ‘gambler’s fallacy’ narrowly (as by definition irrational) or broadly (as used in the blog post, for expecting switches). If the former, we have little evidence that real people commit the gambler’s fallacy. If the latter, then the gambler’s fallacy is not a fallacy.”
I see the point, though I don’t see why we should be too worried about the semantics here. As someone mentioned below, I think the “gambler’s fallacy” is a folk term for a pattern of beliefs, and the claim is that Bayesians (with reasonable priors) exhibit the same pattern of beliefs. Some relevant discussion in the full paper (p. 3), which I (perhaps misguidedly) cut for the sake of brevity:
Good question. It’s hard to tell exactly, but there’s lots of evidence that the rise in “affective polarization” (dislike of the other side) is linked to “partisan sorting” (or “ideological sorting”)—the fact that people within political parties increasingly agree on more and more things, and also socially interact with each other more. Lilliana Mason has some good work on this (and Ezra Klein got a lot of his opinions in his book on this from her).
This paper raises some doubts about the link between the two, though. It’s hard to know!
I think it depends a bit on what we mean by “rational”. But it’s standard to define as “doing the best you CAN, to get to the truth (or, in the case of practical rationality, to get what you want)”. We want to put the “can” proviso in there so that we don’t say people are irrational for failing to be omniscient. But once we put it in there, things like resource-constraints look a lot like constraints on what you CAN do, and therefore make less-ideal performance rational.
That’s controversial, of course, but I do think there’s a case to be made that (at least some) “resource-rational” theories ARE ones on which people are being rational.
Interesting! A middle-ground hypothesis is that people are just as (un)reasonable as they’ve always been, but the internet has given people greater exposure to those who disagree with them.
Nice point! I think I’d say where the critique bites is in the assumption that you’re trying to maximize the expectation of q_i. We could care about the variance as well, but once we start listing the things we care about—chance of publishing many papers, chance of going into academia, etc—then it looks like we can rephrase it as a more-complicated expectation-maximizing problem. Let U be the utility function capturing the balance of these other desired traits; it seems like the selectors might just try to maximize E(U_i).
Of course, that’s abstract enough that it’s a bit hard to say what it’ll look like. But whenever is an expectation-maximizing game the same dynamics will apply: those with more uncertain signals will stay closer to your prior estimates. So I think the same dynamics might emerge. But I’m not totally sure (and it’ll no doubt depend on how exactly we incorporate the other parameters), so your point is well-taken! Will think about this. Thanks!
Very nice point! We had definitely thought about the fact that when slots are large and candidates are few, that would give people from less prestigious/legible backgrounds an advantage. (We were speculating idly whether we could come up with uncontroversial examples...)
But I don’t think we’d thought about the point that people might intentionally manipulate how legible their application is. That’s a very nice point! I’m wondering a bit how to model it. Obviously if the Bayesian selectors know that they’re doing this and exactly how, they’ll try to price it in (“this is illegible” is evidence that it’s from a less-qualified candidate). But I can’t really see how those dynamics play out yet. Will have to think more about it. Thanks!
Nope, it’s the same thing! Had meant to link to that post but forgot to when cross-posting quickly. Thanks for pointing that out—will add a link.
I agree you could imagine someone who didn’t know the factions positions. But of course any real-world person who’s about to become politically opinionated DOES know the factions positions.
More generally, the proof is valid in the sense that if P1 and P2 are true (and the person’s degrees of belief are representable by a probability function), then Martingale fails. So you’d have to somehow say how adding that factor would lead one of P1 or P2 to be false. (I think if you were to press on this you should say P1 fails, since not knowing what the positions are still lets you know that people’s opinions (whatever they are) are correlated.)
Nice point! Thanks. Hadn’t thought about that properly, so let’s see. Three relevant thoughts:
1) For any probabilistic but non-omniscient agent, you can design tests on which it’s poorly calibrated on. (Let its probability function be P, and let W = {q: P(q) > 0.5 & ¬q} be the set of things it’s more than 50% confident in but are false. If your test is {{q,¬q}: q ∈ W}, then the agent will have probability above 50% in all its answers, but its hit rate will be 0%.) So it doesn’t really make sense to say that a system is calibrated or not FULL STOP, but rather that it is (or is not) on a given set of questions.
What they showed in that document is that for the target test, calibration gets worse after RLHF, but that doesn’t imply that calibration is worse on other questions. So I think we should have some caution in generalizing.
2) If I’m reading it right, it looks like on the exact same test, RLHF significantly improved GPT4′s accuracy (Figure 7, just above). So that complicates that “merely introducing human biases” interpretation.
3) Presumably GPT4 after RLHF is a more useful system than GPT4 without it, otherwise they would have released a different version. That’s consistent with the picture that lots of fallacies (like the conjunction fallacy) arise out of useful and efficient ways of communicating (I’m thinking of Gricean/pragmatic explanations of the CF).
How does that argument go? The same is true of a person doing (say) the cognitive reflection task.
“A bat and a ball together cost $1.10; the bat costs $1 more than the ball; how much does the ball cost?”
Standard answer: “$0.10”. But also standardly, if you say “That’s not correct”, the person will quickly realize their mistake.
Hm, I’m not sure I follow how this is an objection to the quoted text. Agreed, it’ll use bits of the context to modify its predictions. But when the context is minimal (as it was in all of my prompts, and in many other examples where it’s smart), it clearly has a default, and the question is what we can learn from that default.
Clearly that default behaves as if it is much smarter and clearer than the median internet user. Ask it to draw a tikz diagram, and it’ll perform better than 99% of humans. Ask it about the Linda problem, and it’ll perform the conjunction fallacy. I was arguing that that is mildly surprising, if you think that the conjunction fallacy is something that 80% of humans get “wrong” (and, remember, 20% get “right”).
Where does the fact that it can be primed to speak differently disrupt that reasoning?
Thanks for the thoughtful reply! Two points.
1) First, I don’t think anything you’ve said is a critique of the “cautious conclusion”, which is that the appearance of the conjunction fallacy (etc) is not good evidence that the underlying process is a probabilistic one. That’s still interesting, I’d say, since most JDM psychologists circa 1990 would’ve confidently told you that the conjunction fallacy + gambler’s fallacy + belief inertia show that the brain doesn’t work probabilistically. Since a vocal plurality of cognitive scientists now think they’re wrong, this is still an argument for the latter, “resource-rational” folks.
Am I missing something, or do you agree that your points don’t speak against the “cautious conclusion”?
2) Second, I of course agree that “it’s just a text-predictor” is one interpretation of ChatGPT. But of course it’s not the only interpretation, nor the most exciting one that lots of people are talking about. Obviously it was optimized for next-word prediction; what’s exciting about it is that it SEEMS like by doing so, it managed to display a bunch of emergent behavior.
For example, if you had asked people 10 years ago whether a neural net optimized for next-word prediction would ace the LSAT, I bet most people would’ve said “no” (since most people don’t). If you had asked people whether it would perform the conjunction fallacy, I’d guess most people would say “yes” (since most people do).
Now tell that past-person that it DOES ace the LSAT. They’ll find this surprising. Ask them how confident they are that it performs the conjunction fallacy. I’m guessing they’ll be unsure. After all, one natural theory of why it aces the LSAT is that it gets smart and somehow picks up on the examples of correct answers in its training set, ignoring/swamping the incorrect ones. But, of course, it ALSO has plenty of examples of the “correct” answer to the conjunction fallacy in its dataset. So if indeed “bank teller” is the correct answer to the Linda problem in the same sense that “Answer B” is the correct answer to LSAT question 34, then why is it picking up on the latter but not the former?
I obviously agree that none of this is definitive. But I do think that insofar as your theory of GPT4 is that it exhibit emergent intelligence, you owe us some explanation for why it seems to treat correct-LSAT-answer differently from “correct”-Linda-problem-answers.
Yeah, that looks right! Nice. Thanks!
I get where you’re coming from, but where do you get off the boat? The result is a theorem of probability: if (1) you update by conditioning on e, and (2) you had positive covariance for your own opinion and the truth, then you commit hindsight bias. So to say this is irrational we need to either say that (1) you don’t update by conditioning, or (2) you don’t have positive covariance between your opinion and the truth. Which do you deny, and why?
The standard route is to deny (2) by implicitly assuming that you know exactly what your prior probability was, at both the prior and future time. But that’s a radical idealization.
Perhaps more directly to your point: the shift only results in over-estimation if your INITIAL estimate is accurate. Remember we’re eliciting (i) E(P(e)) and (ii) E(P(e) | e), not (iii) P(e) and (ii) E(P(e) | e). If (i) always equaled (iii) (you always accurately estimated what you really thought at the initial time), then yes hindsight bias would decrease the accuracy of your estimates. But in contexts where you’re unsure what you think, you WON’T always accurately your prior.
In fact, that’s a theorem. If P has higher-order uncertainty, then there must be some event q such that P(q) ≠ E(P(q)). See this old paper by Samet (https://www.tau.ac.il/~samet/papers/quantified.pdf), and this more recent one with a more elementary proof (https://philarchive.org/rec/DORHU).