Thrasymachus comments on Contra double crux

Thrasymachus 14 Oct 2017 21:45 UTC
12 points
I hope readers will forgive a ‘top level’ reply from me, it’s length, and that I plan to ‘tap out’ after making it (save for betting). As pleasant as this discussion is, other demands pull me elsewhere. I offer a summary of my thoughts below—a mix of dredging up points I made better 3-4 replies deep than I managed in the OP, and to reply to various folks at CFAR. I’d also like to bet (I regret to decline Eli’s offer for reasons that will become apparent, but I hope to make some agreeable counter-offers).
I persist in three main worries: 1) That double crux (or ‘cruxes’ simpliciter) are confused concepts; 2) It doesn’t offer anything above ‘strong consideration’, and insofar as it is not redundant, framing in ‘cruxes’ harms epistemic practice; 3) The evidence CFAR tends to fall back upon to nonetheless justify the practice of double crux is so undermined that it is not only inadequate public evidence, but it is inadequate private evidence for CFAR itself.
The colloid, not crystal, of double crux
A common theme in replies (and subsequent discussions) between folks at CFAR and I is one of a gap in understanding. I suspect ‘from their end’ (with perhaps the exception of Eli) the impression is I don’t quite ‘get it’ (or, as Duncan graciously offers, maybe it’s just the sort of thing that’s hard to ‘get’ from the written up forms): I produce sort-of-but-not-quite-there simulacra of double crux, object to them, but fail to appreciate the real core of double crux to which these objections don’t apply. From mine, I keep trying to uncover what double crux is, yet can’t find any ‘hard edges’: it seems amorphous, retreating back into other concepts when I push on what I think it is distinct, yet flopping out again when I turn to something else. So I wonder if there’s anything there at all.
Of course this seeming ‘from my end’ doesn’t distinguish between the two cases. Perhaps I am right double crux is no more than some colloid of conflated and confused concepts; but perhaps instead there is a a crystallized sense of what double crux is ‘out there’ that I haven’t grasped. Yet what does distinguish these cases in my favour is that CFAR personnel disagree with one another about double crux.
For a typical belief which one might use double crux (or just ‘single cruxing’) should one expect to find one crux, or find multiple cruxes?
Duncan writes (among other things on this point):
The claim that I derive from “there’s surprisingly often one crux” is something like the following: that, for most people, most of the time, there is not in fact a careful, conscious, reasoned weighing and synthesis of a variety of pieces of evidence. [My emphasis]
By contrast, Dan asserts in his explanation:
A typical belief has many cruxes. For example, if Ron is in favor of a proposal to increase the top marginal tax rate in the UK by 5 percentage points, his cruxes might include “There is too much inequality in the UK”, “Increasing the top marginal rate by a few percentage points would not have much negative effect on the economy”, and “Spending by the UK government, at the margin, produces value”. [my emphasis]
This doesn’t seem like a minor disagreement, as it flows through to important practical considerations. If there’s often one crux (but seldom more), once I find it I should likely stop looking; if there’s often many cruxes, I should keep looking after I find the first.
What would this matter, beyond some ‘gotcha’ or cheap point-scoring? This: I used to work in public health, and one key area is evaluation of complex interventions. Key to this in turn is to try and understand both that the intervention works but also how it works. The former without the latter raises introduces a troublesome black box: maybe elaborate high-overhead model for your intervention works through some much simpler causal path (c.f. that many schools of therapy with mutually incompatible models are in clinical equipoise, but appear also in equipoise with ‘someone sympathetic listening to you’); maybe you mistake the key ingredient as intrinsic to the intervention where it is instead contingent on the setting so it doesn’t work when this is changed (c.f. the external validity concerns that plague global health interventions).
In CFAR’s case there doesn’t seem a shared understanding of the epistemic landscape (or, at least, where cruxes lie within it) between ‘practicioners’. It also looks to me there’s not a shared understanding on the ‘how it works’ question—different accounts point in different directions: Vanvier seems to talk more about ‘getting out of trying to win the argument mode to getting to the truth mode’, Duncan emphasizes more potential rationalisations one may have for a belief, Eli suggests it may help locate cases where we differ in framing/fundamental reasons that are in common with more proximal reasons (i.e. the ‘earth versus moon’ hypothetical). Of course, it could do all of these, but I don’t think CFAR has a way to tell. Finding the mediators would also help buttress claims of causal impact.
Cruxes contra considerations
I take it as a ‘bad news’ for an idea, whatever its role, if one can show it is a) a proposed elaboration of another idea, and b) yet this elaboration makes the idea worse. I offer an in theory reason to think ‘cruxes’ are inapt elaborations for ‘considerations’, a couple of considerations as to why ‘double crux’ might degrade epistemic practice, and a bet that, in fact, people who are ‘double cruxing’ (or just ‘finding cruxes’) are often not in fact using cruxes.
Call a ‘consideration’ something like this:
A consideration for some belief B is another belief X such that believing X leads one to assign a higher credence to B.
This is (unsurprisingly) broad, including stuff like ‘reasons’, ‘data’ and the usual fodder for bayesian updating we know and love. Although definitions of a ‘crux’ slightly vary, it seems to be something like this:
A crux for some belief B is another belief C such that if one did not believe C, one would not believe B.
Or:
A crux for some belief B is another belief C such that if one did not believe C, one would change one’s mind about B.
‘Changing one’s mind’ about B is not ultra-exact, but nothing subsequent turns on this point (one could just encode B in the first formulation as ‘I do not change my mind about another belief (A)’, etc.
The crux rule
I said in a reply to Dan given this idea of a crux, a belief should held no more strongly than its (weakest) crux (call this the ‘crux rule’). He expressed uncertainty about whether this was true. I hope this derivation is persuasive:
¬C → ¬B (i.e. if I don’t believe C, I don’t believe B—or, if you prefer, if I don’t believe the crux, I should not ‘not change my mind about’ B)
So:
B → C (i.e. if I believe B, I must therefore believe C).
If B → C, P(C) >= P(B): there is no possibility C is false yet B is true, yet there is a possibility where C is true and B is false (compare modus tollens to affirming the consequent).
So if C is a crux for B, one has inconsistent credences if one offers a higher credence for B than for C. An example: suppose I take “Increasing tax would cause a recession” as a crux for “Increasing taxes is bad”—if I thought increasing taxes would not cause a recession, I would not think increasing taxes is bad. Suppose my credence for raising taxes being bad is 0.9, and my credence for raising taxes causing a recession is 0.6. I’m inconsistent: if I assign a 40% chance raising taxes would not cause a recession, I should think there’s at least a 40% chance raising taxes would not be bad, not 10%.
(In a multi-crux case with C1-n cruxes for B, the above argument applies to C1-n, so B must not be higher than any of them, and thus equal to or lower than the lowest. Although this is a bound, one may anticipate B’s credence to be substantially lower, as the probability of a union of (mostly) independent cruxes approximates P(C1)*P(C2) etc.)
Note this does not apply to considerations, as there’s no neat conditional parsing of ‘consideration’ in the same way as ‘crux’. This also agrees with common sense: imagine some consideration one is uncertain of which nonetheless favours B over ¬B: one can be less confident of X than B.
Why belabour this logic and probability? Because it offers a test of intervention fidelity: whether people who are ‘cruxing’ are really using cruxes. Gather a set of people one takes as epistemically virtuous who ‘know how to crux’ to find cruxes for some of their beliefs. Then ask them to offer their credences for both the belief and the crux(s) for the belief. If they’re always finding cruxes, there will be no cases where they offer higher credence for the belief than its associated crux(s).
I aver the actual proportion of violations of this ‘crux rule’ will be at least 25%. What (epistemically virtuous) people are really doing when finding ‘cruxes’ are strong considerations which they think gave them large updates toward B over ¬B. However, despite this they will often find their credence in the belief is higher than the supposed crux. I might think the argument from evil is the best consideration for atheism, but I may also hold a large number of considerations point in favour in atheism, such they work together to make me more confident of atheism than the soundness of the argument from evil. Readers (CFAR alums or not) can ‘try this at home’. For a few beliefs ‘find your cruxes’. Now offer credences for these—how often do you need to adjust these credences to obey the ‘crux rule’? Do you feel closer to reflective equilibrium when you do so?
Even if CFAR can’t bus in some superforecasters or superstar philosophers to try this on, they can presumably do this with their participants. I offer the following bet (and happy to haggle over the precise numbers):
(5-1 odds [i.e. favouring you].) From any n cases of beliefs and associated cruxes for CFAR alums/participants/any other epistemically virtuous group who you deem ‘know cruxing’, greater than n/4 cases will violate the crux rule.
But so what? Aren’t CFAR folks already willing to accept often ‘crux’ (in Vanvier’s words) ‘degrades gracefully’ into something like what I call a ‘strong consideration’? Rather than castle-and-keep, isn’t this more like constructing a shoddier castle somewhat nearby and knocking its walls down? To-may-to/To-mar-to?
Yet we already have words for ‘things which push us towards a belief’. I used consideration, but we can also use ‘reasons’, or ‘evidence’ or whatever. ‘Strong consideration’ has 16 more characters than crux, but it has the benefits of its meaning being common knowledge, being naturally consonant with bayesianism, and accurately captures how epistemically virtuous people think and how they should be thinking. To introduce another term which is not common knowledge and forms either a degenerate or redundant version of this common knowledge term looks, respectfully, like bloated jargon by my lights.
If you think there’s a crux, don’t double crux, think again
Things may be worse than ‘we’ve already got a better concept’. It looks plausible to me that teaching cruxes (or double crux) teaches bad epistemic practice. A contention I made in the OP is that as crux incidence is anti-correlated with epistemic virtue: epistemically virtuous people usually find in topics of controversy that the support for their beliefs is distributed over a number of considerations, without a clear ‘crux’, rather than they would change their mind in some matter based on a single not-that-resilient consideration. Folks at CFAR seem to (mostly) agree, e.g. Duncan’s remarks:
I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first image [many-one, small edge weighs—T], and when they have a belief web like the third image [not-so-many-one, one much bigger edge -T] they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is.
This suggests one’s reaction on finding you have a crux should be alarm: “My web of beliefs doesn’t look like what I’d expect to see from a person with good epistemics”, and one’s attitude towards ‘this should be the crux for my belief’ should be scepticism: “It’s not usually the case some controversial matter depends upon a single issue like this”. It seems the best next step in such a situation is something like this, “I’m surprised there is a crux here. I should check with experts/the field/peers to whether they agree with me that this is the crux of the matter. If they don’t, I should investigate the other considerations suggested to bear upon this matter/reasons they may offer to assign lower weight to what I take to the be crux”.
The meta-cognitive point is that it is important to not only get the right credences on the considerations, but also to the weigh these considerations rightly to form a good ‘all things considered’ credence on the topic. Webs of belief that greatly overweigh a particular consideration track truth poorly even they are accurate on what it (mis)takes as the key issue. In my experience among elite cognisers, there’s seldom disagreement that a consideration bears upon a given issue. Disagreement seldom occurs about the direction of that consideration either: parties tend to agree a given consideration favours one view or another. Most of the action occurs at the aggregation: “I agree with you this piece of evidence favours your view, but I weigh it less than this other piece of evidence that favours mine.”
Cruxing/double crux seems to give entirely wrong recommendations. It pushes one to try to find single considerations that would change their mind, despite this usually being pathological; it focuses subsequent thinking on those considerations identified as cruxes, instead of the more important issue of whether one is weighing these considerations too heavily; it celebrates when you and your interlocutor agree on the crux of your disagreement, instead of cautioning such cases often indicate you’ve both gotten things wrong.
The plural of plausibly biased anecdote is effectively no evidence
Ultimately, the crux (forgive me) is whether double crux actually works. Suppose ‘meditation’ is to ‘relaxing’ as I allege ‘crux/double crux’ is to ‘consideration’. Pretend all the stuff you hear about ‘meditation’ is mumbo-jumbo that confuses the issue that the only good ‘meditation’ does is that it prompts people to relax. This would be regrettable, but meditation would still be a good thing even if its only parasitic on the good of relaxing. One might wonder if you could do something better than ‘meditation’ by focusing on actually valuable relaxing bit, but maybe this is one of those cases where the stuff around ‘meditation’ is a better route to get people to relax than targeting ‘relaxing’ directly. C.f. Duncan:
I think there’s a third path here, which is something like “double crux may be an instrumentally useful tool in causing these admirable epistemic norms to take root, or to move from nominally-good to actually-practiced.
The evidence base for double crux (and I guess CFAR generally) seems to be something like this:
- Lots of intelligent and reasonable people report cruxing/double crux was helpful for them. (I can somewhat allay Duncan’s worry that the cases he observes might be explained by social pressure he generates—people have reported the same in conversations in which he is an ocean away).
- Folks at CFAR observe many cases where double crux works, and although it might work particularly well between folks at CFAR (see Eli’s comment) but in any case they still observe it to be handy with non-CFAR staff.
- Duncan notes in a sham control test (i.e. double crux versus ‘discussing the benefits of epistemic virtues’).
- Dan provides some participant data: about half ‘find a double crux’, and it looks like finding a disagreement, finding a double crux (or both) was associated with a more valuable conversation.
Despite general equanimity, Duncan noted distress at the ‘lack of epistemic hygiene’ around looking at double crux, principally (as I read him) that of excessive scepticism from some outside CFAR. With apologies to him (and the writer of Matthew 7), I think the concern is more plausible in reverse: whatever motes blemish outsider eyes do not stop them seeing the beams blocking CFAR’s insight. It’s not only the case that outsiders aren’t being overly sceptical in doubting this evidence, CFAR is being overly credulous taking it as seriously as they do. Consider this:
1. In cases where those who are evaluating the program are those involved in delivering the intervention, and they expectedly benefit the better the results, there’s a high risk of bias. (c.f. blinding, conflict of interest)
2. In cases where individuals enjoy some intervention (and often spent a quite a lot of money to participate) there’s a high risk of bias for their self-report. (c.f. choice-supportive bias, halo effect, among others).
3. Neither good faith nor knowledge of a potential bias risk do not, by themselves, help one much to avoid this bias.
4. Prefer hard metrics with tight feedback loops when trying to perform well at something.
5. Try and perform some reference class forecasting to avoid getting tricked by erroneous insider views (but I repeat myself).
What measure of credulity should a rationalist mete out to an outsider group with a CFAR-like corpus of evidence? I suggest it would be meagre indeed. One can recite almost without end interventions with promising evidence fatally undercut by minor oversights or bias (e.g. inadequate allocation concealment in an RCT). In the class of interventions where the available evidence has multiple, large, obvious bias risks, the central and modal member of this class is an intervention with no impact.
We should mete out this meagre measure of credulity to ourselves: we should not on the one hand remain unmoved by the asseveration of a chiropractor that they ‘really see it works’, yet take evidence of similar quality and quantity to vindicate rationality training. CFAR’s case is unpersuasive public evidence. I go further: it’s unpersuasive private evidence too. In the same way we take the chiropractor be irrational if they don’t almost-entirely discount their first-person experience of chiropractic successes when we inform them of the various cognitive biases that undercut the evidentiary value of this experience, we should expect a CFAR instructor or alum, given what they already know about rationality, to almost entirely discount these sources of testimonial evidence to judge whether double crux works.
Yet this doesn’t happen. Folks at CFAR tend to lead with this anecdata when arguing that double crux works. This also mirrors ‘in person’ conversations I have, where otherwise epistemically laudable people cite their personal experience as what convinces them of the veracity of a particular CFAR technique. What has a better chance of putting one in touch with reality about whether double crux (or CFAR generally) works is the usual scientific suspects: focusing on ‘hard outcomes’, attempting formal trials, randomisation, making results public, and so forth. That this generally hasn’t happened across the time of CFARs operation I take to be a red flag.
For this reason I respectfully decline Eli’s suggestion to make bets on whether CFAR will ‘stick with’ double crux (or something close to it) in the future. I don’t believe CFAR’s perception of what is working will track the truth, and so whether or not it remains ‘behind double crux’ is uninformative for the question of whether double crux works. I’m willing to offer bets against whether CFAR will gain ‘objective’ evidence of efficacy, and bet in favour of the null hypothesis for these:
(More an error bounty than a bet—first person to claim gets £100) CFAR’s upcoming “EA impact metrics report” will contain no ‘objective measures’ (defined somewhat loosely—objective measure is something like “My income went up/BMI went down/independent third party assessor rated the conversation as better”, not things along the lines of, “Participants rate the workshop as highly valuable/instructor rates conversations as more rational/ etc.”
(3-to-1): CFAR will not generate in the next 24 months any peer reviewed literature in psychology or related fields (stipulated along the lines of, “either published in the ‘original reports’ section of a journal with impact factor >1 or presenting at an academic conference.”
(4 to 1): Conditional on a CFAR study getting past peer review, it will not show significantly positive effects on any objective, pre-specified outcome measure.
I’m also happy to offer bets on objective measures of any internal evaluations re. double crux or CFAR activity more broadly.
- habryka 14 Oct 2017 22:29 UTC
  6 points
  Parent
  I agree with the gist of the critique of double crux as presented here, and have had similar worries. I don’t endorse everything in this comment, but think taking it seriously will positively contribute to developing an art of productive disagreement.
  I think the bet at the end feels a bit fake to me, since I think it is currently reasonable to assume that publishing a study in a prestigious psychology journal is associated with something around 300 person hours of completely useless bureaucratic labor, and I don’t think it is currently worth it for CFAR to go through that effort (and neither I think is it for almost anyone else). However, if we relax the constraints to only reaching the data quality necessary to publish in a journal (verified by Carl Shulman or Paul Christiano or Holden Karnofsky, or whoever we can find who we would both trust to assess this), I am happy to take you up on your 4-to-1 bet (as long as we are measuring the effect of the current set of CFAR instructors teaching, not some external party trying to teach the same techniques, which I expect to fail).
  I sadly currently don’t have the time to write a larger response about the parts of your comment I disagree with, but think this is an important enough topic that I might end up writing a synthesis on things in this general vicinity, drawing from both yours and other people’s writing. For now, I will leave a quick bullet list of things I think this response/argument is getting wrong:
  - While your critique is pointing out true faults in the protocol of double crux, I think it has not yet really engaged with some of the core benefits I think it brings. You somewhat responded to this by saying that you think other people don’t agree what double crux is about, which is indeed evidence of the lack of a coherent benefit, however I claim that if you would dig deeper into those people’s opinion, you will find that the core benefits they claim might sound superficially very different, but are actually at the core quite similar and highly related. I personally expect that we two would have a more productive disagreement than you and Duncan, and so I am happy to chat in person, here on LW or via any chat service of your preference if you want to dig deeper into this. Though obviously feel completely free to decline this.
  - I particularly think that the alternative communication protocols you proposed are significantly worse and, in as much as they are codified, do not actually result in more productive disagreement.
  - I have a sense that part of your argument still boils down to “CFAR’s arguments are not affiliated enough with institutions that are allowed to make a claim about something like this (whereas academic philosophers and psychology journals are).”. This is a very tentative impression, and I do not want to give you the sense that you have to defend yourself for this. I have high priors on people’s epistemics being tightly entangled with their sense of status, and usually require fairly extraordinary evidence until I am convinced that this is not the case for any specific individual. However, since this kind of psychologizing almost never results in a productive conversation, this is not a valid argument in a public debate. And other people should be very hesitant to see my position as additional evidence of anything. But I want to be transparent in my epistemic state, and state the true reasons for my assessment as much as possible.
  - While I agree that the flaws you point out are indeed holding back the effectiveness of double crux, I disagree that they have any significant negative effects on your long-term epistemics. I don’t think CFAR is training people to adopt worse belief structures after double cruxing, partially because I think the default incentives on people’s belief structures as a result of normal social conversation are already very bad and doing worse by accident is unlikely, and because the times I’ve seen double crux in practice, I did not see mental motions that would correspond to the loss of the web-like structure of their beliefs, and more noticed a small effect in the opposite direction (i.e. people’s social stance towards something was very monolithic and non-web-like, but as soon as they started double cruxing their stated beliefs were much more web-like).
  Overall, I am very happy about this comment, and would give it a karma reward had I not spent time writing this comment, and implemented the moderator-karma-reward functionality instead. While I would have not phrased the issues I have with double crux in the same language, the issues it points out overlap to a reasonable degree with the ones that I have, and so I think it also represents a good chunk of my worries. Thank you for writing it.
  - Thrasymachus 14 Oct 2017 23:20 UTC
    3 points
    Parent
    Thanks for your reply. Given my own time constraints I’ll decline your kind offer to discuss this further (I would be interested in reading some future synthesis). As consolation, I’d happily take you up on the modified bet. Something like:
    Within the next 24 months CFAR will not produce results of sufficient quality for academic publication (as judged by someone like Christiano or Karnofsky) that demonstrate benefit on a pre-specified objective outcome measure
    I guess ‘demonstrate benefit’ could be stipulated as ‘p<0.05 on some appropriate statistical test’ (the pre-specification should get rid of the p-hacking worries). ‘Objective’ may remain a bit fuzzy: the rider is meant to rule out self-report stuff like “Participants really enjoyed the session/thought it helped them”. I’d be happy to take things like “Participants got richer than controls”, “CFAR alums did better on these previously used metrics of decision making”, or whatever else.
    Happy to discuss further to arrive at agreeable stipulations—or, if you prefer, we can just leave them to the judges discretion.
    - habryka 14 Oct 2017 23:47 UTC
      2 points
      Parent
      Ah, the 4-1 to one bet was a conditional one:
      (4 to 1): Conditional on a CFAR study getting past peer review, it will not show significantly positive effects on any objective, pre-specified outcome measure.
      I don’t know CFAR’s current plans well enough to judge whether they will synthesize the relevant evidence. I am only betting that if they do, the result will be positive. I am still on the fence of taking a 4-1 bet on this, but the vast majority of my uncertainty here comes from what CFAR is planning to do, not what the result would be. I would probably take a 5-1 bet on the statement as you proposed it.
      - Thrasymachus 15 Oct 2017 15:07 UTC
        1 point
        Parent
        Sorry for misreading your original remark. Happy to offer the bet in conditional, i.e.:
        Conditional on CFAR producing results of sufficient quality for academic publication (as judged by someone like Christiano or Karnofsky) these will fail to demonstrate benefit on a pre-specified objective outcome measure
- Conor Moreton 15 Oct 2017 1:04 UTC
  −3 points
  Parent
  This comment combines into one bucket several different major threads that probably each deserve their own bucket (e.g. the last part seems like strong bets about CFAR’s competence that are unrelated to “is double crux good”). Personally I don’t like that, though it doesn’t seem objectively objectionable.
  - habryka 15 Oct 2017 1:11 UTC
    3 points
    Parent
    I agree with this, and also prefer to keep discussion about the competence of specific institutions or individuals to a minimum on the frontpage (this is what I want to have a community tag for).