I like most of this; it seems like the sort of post that’s going to lead to significant improvements in people’s overall ability to do collaborative truth-seeking, because it makes concrete and specific recommendations that overall seem useful and sane.
However,
principally because ‘double cruxes’ are rare in topics where reasonable people differ
disagreements like Xenia’s and Yevgeny’s, which can be eventually traced to a single underlying consideration, are the exception rather than the rule
and similar make me wish that posts like this would start by getting curious about CFAR’s “n” of several hundred, rather than implicitly treating it as irrelevant. We’ve been teaching double crux at workshops for a couple of years now, and haven’t stopped the way we’ve stopped with other classes and concepts that weren’t pulling their weight.
My sense is that the combined number of all of the double-crux-doubters and double-crux-strugglers still does not approach, in magnitude, the number of people who have found double crux moderately-to-very useful and workable and helpful (and, in specific, the number of people who have been surprised to discover that double cruxes do in fact exist and are findable an order of magnitude more often than one would have naively guessed).
It does not distress me to consider that double crux might be imperfect, or even sufficiently broken that it should be thrown out.
It does distress me when claims about its imperfectness or brokenness fail to address the existing large corpus of data about its usefulness. Posts like these seem to me to be biased toward the perspective of people who have tried to pick it up online and piece it together, and to discount the several hundred people who’ve been taught it at workshops (well over a thousand if you count crash courses at e.g. EA conferences). At the very least, claim that CFAR is lying or that its participants are socially pressured into reporting a falsehood—don’t just ignore the data outright.
(Edit: I consider these to be actually reasonable things to posit; I would want them to be posited politely and falsifiably if possible, but I think it’s perfectly okay for people to have, as hypotheses, that CFAR’s deceiving itself or others, or that people’s self-report data is unduly swayed by conformity pressures. The hypotheses themselves seem inoffensive to me, as long as they’re investigated soberly instead of tribally.)
=( Sorry for the strength of language, here (I acknowledge and own that I am a bit triggered), but it feels really important to me that we have better epistemic hygiene surrounding this question. The contra double crux argument almost always comes most strongly from people who haven’t actually talked to anyone skilled in double crux or haven’t actually practiced it in an environment where facilitators and instructors could help them iterate in the moment toward an actually useful set of mental motions.
(And no, those aren’t the only people arguing contra double crux. Plenty of actual CFAR grads are represented by posts like the above; plenty of actual CFAR grads struggled and asked for help and still walked away confused, without a useful tool. But as far as I can tell, they are in a plurality, not a majority.)
It seems critical to me that we be the type of community that can distinguish between “none of CFAR’s published material is sufficient to teach this technique on its own” and “this technique isn’t sufficiently good to keep iterating on.” The former is almost trivially true at this point, but that’s weak evidence at best for the latter, especially if you take seriously (repeated for emphasis) the fact that CFAR has a higher n on this question than literally anybody else, as far as I know.
Scattered other thoughts:
As far as I can tell, none of the most serious doubters in the previous thread actually stepped up to try out double crux with any of the people who claimed to be skilled and capable in it. I know there was an additional “post disagreements and we’ll practice” thread, but that seemed to me to contain low density of doubters giving it a serious try with proficient partners.
Also, the numbered description in the “what mostly happens” is, according to me, something like an 85% overlap for double crux. It is a set of motions that the double crux class and tips point people toward and endorse. So that leaves me in confuséd disagreement with the claim that it is “perhaps possible” for double crux to be twisted around to include them—from my point of view, it already does. Which again points to a problem with the explanation and dissemination of the technique, but not with the core technique itself, as transmitted to real humans over the course of a couple of hours of instruction with facilitators on hand*. This causes me to wonder whether the real crux is the jargon problem that you alluded to near the end, with people being upset because they keep crystallizing specific and inaccurate impressions of what double crux is, doing things that seem better, and getting angry at double crux for not being those better things (when in at least many of the cases, it is, and the problem is one of semantics).
* Speaking of “a couple of hours of instruction with facilitators on hand,” I’m also a little sad about what I read as an implicit claim that “well, double crux doesn’t work as well as being the sort of philosopher or debater who’s spent thousands upon thousands of hours practicing and looking at the nature of truth and argument and has been educated within a solid culture with a long tradition.” It seems to me that double crux can take a 1-4 hour investment of time and attention and jumpstart people to being something like a quarter as good as those lifelong philosophers.
And I think that’s amazing, and super promising, and that it’s disingenuous to point at literal world-class professionals and say “what they do is better.” It strikes me as true, but irrelevant?It’s like if I offered an afternoon martial arts seminar that had 50% of its participants walking away with blue-belt level skill (and the other 50% admittedly lost or confused or unimpressed) and someone criticized it for not being as useful or correct as what black belts with ten years of experience do. It’s still an afternoon seminar to get excited about, and for outsiders looking in to wonder “what’s the secret sauce?”
(I realize now that I’m making a claim that may not have been published before, about double crux being intended to be a scrappy small-scale bootstrapping technique, and not necessarily the final step in one’s productive-disagreement evolution. That’s a novel claim, and one that other CFAR staff might disagree with me on, so I retract the part of my grrrr that was based on you not-taking-into-account-this-thing-you-couldn’t-possibly-have-known-about).
Here’s my sense of the bar that putative double crux replacement needs to meet, because I claim double crux is already reliably meeting it:
Be explainable within 30-60 minutes in a deliberately scaffolded educational context
Be meaningfully practicable within another 30-60 minutes
Be sticky enough that people in fact desire to reference it and spread it and use it in the future
Provide a significant boost in productivity-in-debate and/or a corresponding significant reduction in wasted time/antagonism à la giving people 5-25% of the skill that you see in those philosophers
Do all of the above for greater than 33% of people who try it
Here’s my sense of what is unfairly and inaccurately being expected of double crux:
Be grokkable by people based entirely on text and hearsay/be immune to problems that arise from [games of telephone] or [imperfect capture and reconstruction through writing].
Be “attractive” in the sense that people fumbling around on their own will necessarily find themselves moving closer toward the right motions rather than sideways or away, without help or guidance.
Be “complete” in the sense that it contains all of of what we know about how to improve disagreement and/or debate.
Provide an efficacy boost of greater than 50% for greater than 75% of the people who try it.
Here are my “cruxes” on this question:
I would drastically reduce my support for double crux as a technique if it turned out that what was needed was something that could be asynchronously transmitted (e.g. because in-person instruction insufficiently scales).
I would drastically reduce my support for double crux if it turned out that 50+% of the people who reported valuable experiences in my presence later on discovered that the knowledge didn’t stick or that the feeling had been ephemeral (e.g. possibly social-conformity based rather than real).
I would drastically reduce my support for double crux if I attempted to convey my True Scotsman version to four high-quality skeptics/doubters (such as Thrasymachus) and all four afterward told me that all of the nuance I thought I was adding was non-useful, or had already been a part of their model and was non-novel.
I would drastically reduce my support for double crux if a concrete alternative that was chunkable, grokkable, teachable, and had thing-nature (as opposed to being a set of principles that are hard to operationalize) was proposed and was promising after its first iterations with 30+ people.
tl;dr I claim that most of the good things you point at are, in fact, reasonably captured by double crux and have been all along; I claim that most of the bad things you point at are either due to insufficient pedagogy (our fault) or are likely to be universal and bad for any attempt to create a technique in this space. I recognize that I’m in danger of sliding into No True Scotsman territory and I nevertheless stick to my claim based on having done this with hundreds of people, which is something most others have not had the opportunity to do.
It does distress me when people who argue that it’s imperfect or broken do not even bother to address a very large corpus of data about its usefulness, including our claim that, in our experiences, the overlapping common crux is actually there.
As a datapoint, this is the first time that I remember hearing that there would exist “a very large corpus of data about its usefulness”. The impression I got from the original DC post was that this was popular among CFAR’s instructors, but that you’d been having difficulties effectively teaching this to others.
I think that if such a corpus of evidence exists, then the main reason why people are ignoring it is because the existence of this corpus hasn’t been adequately communicated, making the implicit accusation of “your argument isn’t taking into account all the data that it should” unfair.
That’s sensible. I would have thought it was implied by “CFAR’s taught this at every workshop and event that it’s run for the past two years,” but I now realize that’s very typical-mind-fallacy of me.
That’s what I mean by typical mind fallacy. I live in the universe where it’s obvious that double crux is being taught and tinkered with constantly, because I work at CFAR, and so I just stupidly forgot that others don’t have access to that same knowledge. i.e. I was elaborating on my agreement with you, above.
Also, by “newer concept” we just mean relative to CFAR’s existence since 2012. It’s younger than e.g. inner sim, or TAPs, but it’s been a part of the curriculum since before my hiring in October 2015.
Also also in consolidating comments I have discovered that I lack the ability to delete empty comments.
I’ve just been going around and deleting the empty comments. Right now we don’t allow users to delete comments, since that would also delete all children comments by other authors (or at least make them inaccessible). Probably makes sense for people to be able to delete their own comments if they don’t have any children.
My thanks for your reply. I apologise if my wording in the OP was inflammatory or unnecessarily ‘triggering’ (another commenter noted an ‘undertone of aggression’, which I am sorry for, although I promise it wasn’t intended—you are quoted repeatedly as you wrote the canonical exposition for what I target in the OP, rather than some misguided desire to pick a fight with you on the internet). I hope I capture the relevant issues below, but apologies in advance if I neglect or mistake any along the way.
CFAR’s several hundred and the challenge of insider evidence
I was not aware of the of the several hundred successes CFAR reports of double crux being used ‘in the wild’. I’m not entirely sure whether the successes are a) those who find double crux helpful or b) particular instances of double crux resolving disagreement, but I think you would endorse plenty of examples of both. My pretty sceptical take on double crux had ‘priced in’ the expectation of CFAR instructors at least some/many alums thought it was pretty nifty.
You correctly anticipate the sort of worries I would have about this sort of evidence. Self-reported approbation from self-selected participants is far from robust. Given branches of complementary medicine can probably tout thousands to millions of ‘positive results’ and happy customers, yet we know it is in principle intellectually bankrupt, and in practice performs no better than placebo in properly conducted trials. (I regret to add replies along the lines of, “if you had received the proper education in the technique you’d—probably—see it works well”, or “I’m a practicioner with much more experience than any doubter in terms of using this, and it works in my experience” also have analogies here).
I don’t think one need presume mendacity on the part of CFAR, nor gullibility on the part of workshop attendees, to nonetheless believe this testimonial evidence isn’t strongly truth-tracking: one may anticipate similarly positive reports in worlds where (perhaps) double crux doesn’t really work, but other stuff CFAR practices does work, and participants enjoy mingling with similarly rationally minded participants, may have had to invest 4 figure sums to get on the workshop, and so on and so forth. (I recall CFAR’s previous evaluation had stupendous scores on self-reported measures, but more modest performance on objective metrics).
Of course, unlike complementary medicine, double crux does not have such powerful disconfirmation as ‘violates known physics’ or ‘always fails RCTs’. Around the time double crux was proposed I challenged double crux on theoretical grounds (i.e. double cruxes should be very rare), this post was prompted by some of the dissonance on previous threads, but also the lack of public examples of double crux working. Your experience of the success of double crux in workshops is essentially private evidence (at least for now): in the same way it is hard to persuade me of its validity, it is next to impossible for me to rebut it. I nonetheless hope other lines of inquiry are fruitful.
How sparse is a typical web of belief?
One is the theoretical point. I read in your reply disagreement on the ‘double cruxes’ should be rare point (”… the number of people who have been surprised to discover that double cruxes do in fact exist and are findable an order of magnitude more often than one would have naively guessed”). Although you don’t include it as a crux in your reply, it looks pretty crucial to me. If cruxes are as rare as I claim, double cruxing shouldn’t work, and so the participant reports are more likely to have been innocently mistaken.
In essence, I take the issue to be around what web of belief surrounding a typical subject of disagreement looks like. It seems double crux is predicated on this being pretty sparse (at least in terms of important considerations): although lots of beliefs might have some trivial impact on your credence B, B is mainly set by small-n cruxes (C), which are generally sufficient to change ones mind if ones attitude towards them changes.
By contrast, I propose the relevant web tends to be much denser (or, alternatively, the ‘power’ of the population of reasons that may alter ones credence in B is fairly evenly distributed). Credence in B arises from a large number of considerations that weigh upon it, each of middling magnitude. Thus even if I am persuaded one is mistaken, my credence in B does not change dramatically. It follows that ‘cruxes’ are rare, and so two people happening to discover their belief on some recondite topic B is principally determined by some other issue (C), and it is the same for both of them is rare.
This is hard to make very crisp, as (among others) the ‘space of all topics reasonable people disagree’ are hard to pin down. Beyond appeals to my own experience and introspection (“do you really find your belief in (let’s say, some political view like gay marriage or abortion) depends on a single consideration to such a degree that, if it was refuted, you would change your view?) I’d want to marshal a couple of other considerations.
When one looks at a topic in philosophy, or science, or many other fields of enquiry, one usually sees a very one-to-many relationship of the topic to germane considerations. A large number of independent lines of evidence support the theory of evolution; a large number of arguments regarding god’s existence in philosophy receive scrutiny (and in return they spawn a one-to-many relationship of argument to objections, objection to counter-objections). I suggest this offers analogical evidence in support of my thesis.
Constantin’s report of double cruxing (which has been used a couple of times as an exemplar in other threads) seems to follow the pattern I expect. I struggle to identify a double-crux in the discussion Constatin summarizes: most of the discussion seems to involve whether Salvatier’s intellectual project is making much progress, with then a host of subsidiary considerations (e.g. how much to weigh ‘formal accomplishments’, the relative value of more speculative efforts on far future considerations, etc.), but it is unclear to me if Constantin was persuaded Salvatier’s project was making good progress this would change her mind about the value of the rationalist intellectual community (after all, one good project may not be adequate ‘output’) or vice versa (even if Salvatier recognises his own project was not making good progress, the rationality community might still be a fertile ground to cultivate his next attempt, etc.)
What comprises double-crux?
I took the numbered list of my counter-proposal to have 25% overlap with double crux (i.e. realising your credences vary considerably), not 85%. Allow me to be explicit on how I see 2-4 in my list are in contradistinction to the ‘double crux algorithm’:
There’s no assumption of an underlying single ‘crux of the matter’ between participants, or for either individually.
There’s no necessity for a given consideration (even the strongest identified) to be individually sufficient to change ones mind about B
There’s also no necessity for the strongest considerations proposed by X and Y to have common elements.
There’s explicit consideration of credence resilience. Foundational issues may by ‘double cruxes’ in that (e.g.) my views on most applied ethics questions would change dramatically if I were persuaded of the virtue ethics my interlocutor holds, but one often makes more progress discussing a less resilient non-foundational claim even if ‘payoff’ in terms of the subsequent credence change in the belief of interest is lower.
This may partly be explained by a broader versus narrower conception of double crux. I take the core idea of double crux to be the ‘find some C for which your disagreement over B relies upon, then discuss C’ (this did, in my defense, comprise the whole of the ‘how to play’ section in the initial write-up). I take you to holding a broader view, where double crux incorporates other related epistemic practices, and it has value in toto.
My objection is expressly this. Double crux is not essential for these incorporated practices. So one can compare discussion with the set of these other practices to this set with the addition of double crux. I aver the set sans double crux will lead to better discussions.
Pedagogy versus performance
I took double crux was mainly being proposed as a leading strategy to resolve disagreement. Hence the comparison to elite philosophers was to suggest it wasn’t a leading strategy by pointing to something better. I see from this comment (and the one you split off into its own thread) you see it more as a more a pedagogical role—even if elite performers do something different, it does valuable work in improving skills. Although I included a paragraph about its possible pedagogical value (admittedly one you may have missed as I started it with a self-indulgent swipe at the rationalist community), I would have focused more on this area had I realised it was CFAR’s main contention.
I regret not to surprise you with doubts about the pedagogical value as well. This mostly arises from the above concerns: if double cruxes are as rare as I propose, it is unclear how searching for them is that helpful an exercise. A related worry (related to the top of the program) is this seems to entail increasing reliance on private evidence regarding whether the technique works: in principle objections to the ‘face value’ of the technique apply less (as it is there to improve skills rather than a proposal for what the ‘finished article’ should look like); adverse reports from non-CFAR alums don’t really matter (you didn’t teach them, so it is no surprise they don’t get it right). What one is left with is the collective impressions of instructors, and the reports of the students.
I guess I have higher hopes for transparency and communicability of ‘good techniques’. I understand CFAR is currently working on further efforts to evaluate itself. I hope to be refuted by the forthcoming data.
I want to bring up sequence thinking and cluster thinking, which I think are useful in understanding the disagreement here. As I understand it, Duncan argues that sequence thinking is more common than cluster thinking, and you’re arguing the converse.
I think most beliefs can be put in either a cluster-thinking or a sequence-thinking framework. However, I think that (while both are important and useful) cluster thinking is generally more useful for coming up with final conclusions. For that reason, I’m suspicious of double crux, because I’m worried that it will cause people to frame their beliefs in a sequence-thinking way and feel like they should change their beliefs if some important part of their sequence was proven wrong, even though (I think) using cluster thinking will generally get you more accurate answers.
As I understand it, Duncan argues that sequence thinking is more common than cluster thinking, and you’re arguing the converse.
This looks remarkably like an attempt to identify a crux in the discussion. Assuming that you’re correct about double-cruxing being problematic due to encouraging sequence-like thinking: isn’t the quoted sentence precisely the kind of simplification that propagates such thinking? Conversely, if it’s not a simplification, doesn’t that provide (weak) evidence in favor of double-cruxing being a useful tool in addressing disagreements?
I think that sequence thinking is important and valuable (and probably undersupplied in the world in general, even while cluster thinking is undersupplied in the rationalist community in specific). However, I think both Thrasymachus and Duncan are doing cluster thinking here—like, if Duncan were convinced that cluster thinking is actually generally a better way of coming to final decisions, I expect he’d go “that’s weird, why is CFAR getting such good results from teaching double crux anyway?” not “obviously I was wrong about how good double crux is.” Identifying a single important point of disagreement isn’t a claim that it’s the only important point of disagreement.
I like this point a lot, and your model of me is accurate, at least insofar as I’m capable of simming this without actually experiencing it. For instance, I have similar thoughts about some of my cutting/oversimplifying black-or-white heuristics, which seem less good than the shades-of-gray epistemics of people around me, and yet often produce more solid results. I don’t conclude from this that those heuristics are better, but rather that I should be confused about my model of what’s going on.
that makes a ton of sense for theoretically justified reasons I don’t know how to explain yet. anyone want to collab with me on a sequence? I’m a bit blocked on 1. exactly what my goal is and 2. what I should be practicing in order to be able to write a sequence (given that I’m averse to writing post-style content right now)
Naturally, and I wasn’t claiming it was. That being said, I think that when you single out a specific point of disagreement (without mentioning any others), there is an implication that the mentioned point is, if not the only point of disagreement, then at the very least the most salient point of disagreement. Moreover, I’d argue that if Duncan’s only recourse after being swayed regarding sequence versus cluster thinking is “huh, then I’m not sure why we’re getting such good results”, then there is a sense in which sequence versus cluster thinking is the only point of disagreement, i.e. once that point is settled, Duncan has no more arguments.
(Of course, I’m speaking purely in the hypothetical here; I’m not trying to make any claims about Duncan’s actual epistemic state. This should be fairly obvious given the context of our discussion, but I just thought I’d throw that disclaimer in there.)
First off, a symmetric apology for any inflammatory or triggering nature in my own response, and an unqualified acceptance of your own, and reiterated thanks for writing the post in the first place, and thanks for engaging further. I did not at any point feel personally attacked or slighted; to the degree that I was and am defensive, it was over a fear that real value would be thrown out or socially disfavored for insufficient reason.
(I note the symmetrical concern on your part: that real input value will be thrown out or lost by being poured into a socially-favored-for-insufficient-reason framework, when other frameworks would do better. You are clearly motivated by the Good.)
You’re absolutely right that the relative lack of double cruxes ought be on my list of cruxes. It is in fact, and I simply didn’t think of it to write it down. I highly value double crux as a technique if double cruxes are actually findable in 40-70% of disagreements; I significantly-but-not-highly value double crux if double cruxes are actually findable in 25-40% of disagreements; I lean toward ceasing to investigate double crux if they’re only findable in 10-25%, and I am confused if they’re rarer than 10%.
By contrast, I propose the relevant web tends to be much denser (or, alternatively, the ‘power’ of the population of reasons that may alter ones credence in B is fairly evenly distributed). Credence in B arises from a large number of considerations that weigh upon it, each of middling magnitude. Thus even if I am persuaded one is mistaken, my credence in B does not change dramatically. It follows that ‘cruxes’ are rare, and so two people happening to discover their belief on some recondite topic B is principally determined by some other issue (C), and it is the same for both of them is rare.
I agree that this is a relevant place to investigate, and at the risk of proving you right at the start, I add it to my list of things which would cause me to shift my belief somewhat.
The claim that I derive from “there’s surprisingly often one crux” is something like the following: that, for most people, most of the time, there is not in fact a careful, conscious, reasoned weighing and synthesis of a variety of pieces of evidence. That, fompmott, the switch from “I don’t believe this” to “I now believe this” is sudden rather than gradual, and, post-switch, involves a lot of recasting of prior evidence and conclusions, and a lot of further confirmation-biased integration of new evidence. That, fompmott, there are a lot of accumulated post-hoc justifications whose functional irrelevance may not even be consciously acknowledged, or even safe to acknowledge, but whose accumulation is strongly incentivized given a culture wherein a list of twenty reasons is accorded more than 20x the weight of a list of one reason, even if nineteen of those twenty reasons are demonstrated to be fake (e.g. someone accused of sexual assault, acquitted due to their ironclad alibi that they were elsewhere, and yet the accusation still lingers because of all the sticky circumstantial bits that are utterly irrelevant).
In short, the idealized claim of double crux is that people’s belief webs look like this:
And on reflection and in my experience, the missing case that tilts toward “double crux is surprisingly useful” is that a lot of belief webs look like this:
… where they are not, in fact, simplistic and absolutely straightforward, but there often is a crux which far outweighs all of the other accumulated evidence.
I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
If I’m reading you right, this takes care of your first bullet point above entirely and brings us closer to a mutual understanding on your second bullet point. Your third bullet point remains entirely unaddressed in double crux except by the fact that we often have common cultural pressures causing us to have aligned-or-opposite opinions on many matters, and thus in practice there’s often overlap. Your fourth bullet point seems both true and a meaningful hole or flaw in double crux in its idealized, Platonic form, but also is an objection that in practice is rather gracefully integrated by advice to “keep ideals in mind, but do what seems sane and useful in the moment.”
To the extent that those sections of your arguments which miss were based on my bad explanation, that’s entirely on me, and I apologize for the confusion and the correspondingly wasted time (on stuff that proved to be non-crucial!). I should further clarify that the double crux writeup was conceived in the first place as “well, we have a thing that works pretty well when transmitted in person, but people keep wanting it not transmitted in person, partly because workshops are hard to get to even though we give the average EA or rationalist who can’t afford it pretty significant discounts, so let’s publish something even though it’s Not Likely To Be Good, and let’s do our best to signal within the document that it’s incomplete and that they should be counting it as ‘better than nothing’ rather than judging it as ‘this is the technique, and if I’m smart and good and can’t do it from reading, then that’s strong evidence that the technique doesn’t work for me.’” I obviously did not do enough of that signaling, since we’re here.
Re: the claim “Double crux is not essential for these incorporated practices.” I agree wholeheartedly on the surface—certainly people were doing good debate and collaborative truthseeking for millennia before the double crux technique was dreamed up.
I would be interested in seeing a side-by-side test of double crux versus direct instruction in a set of epistemic debate principles, or double crux versus some other technique that purports to install the same virtues. We’ve done some informal testing of this within CFAR—in one workshop, Eli Tyre and Lauren Lee taught half the group double crux as it had always previously been taught, while I discussed with the other half all of the ways that truthseeking conversations go awry, and all of the general desiderata for a positive, forward-moving experience. As it turned out, the formal double crux group did noticeably better when later trying to actually resolve intellectual disagreement, but the strongest takeaway we got from it was that the latter group didn’t have an imperative to operationalize their disagreement into concrete observations or specific predictions, which seems like a non-central confound to the original question.
As for “I guess I have higher hopes for transparency and communicability of ‘good techniques’,” all I can do is fall back yet again on the fact that, every time skepticism of double crux has reared its head, multiple CFAR instructors and mentors and comparably skilled alumni have expressed willingness to engage with skeptics, and produce publicly accessible records and so forth. Perhaps, since CFAR’s the one claiming it’s a solid technique, 100% of the burden of creating such referenceable content falls on us, but one would hope that the relationship between enthusiasts and doubters is not completely antagonistic, and that we could find some Robin Hansons to our Yudkowskys, who are willing to step up and put their skepticism on the line as we are with our confidence.
As of yet, not a single person has sent me a request of the form “Okay, Duncan, I want to double crux with you about X such that we can write it down or video it for others to reference,” nor has anyone sent me a request of the form “Okay, Duncan, I suspect I can either prove double crux unworth it or prove [replacement Y] a more promising target. Let’s do this in public?”
I really really do want all of us to have the best tool. My enthusiasm for double crux has nothing to do with an implication that it’s perfect, and everything to do with a lack of visibly better options. If that’s just because I haven’t noticed something obvious, I’d genuinely appreciate having the obvious pointed out, in this case.
Thank you for your gracious reply. I interpret a couple of overarching themes in which I would like to frame my own: the first is the ‘performance issue’ (i.e. ‘How good is double crux at resolving disagreement/getting closer to the truth’); the second the ‘pedagogical issue’ (i.e. ‘how good is double crux at the second order task of getting people better at resolving disagreement/getting closer to the truth’). I now better understand you take the main support from double crux to draw upon the latter issue, but I’d also like to press on some topics about the former on which I believe we disagree.
How well does double crux perform?
Your first two diagrams precisely capture the distinction I have in mind (I regret not having thought to draw my own earlier). If I read the surrounding text right (I’m afraid not to know what ‘fompmott’ means, and google didn’t help me), you suggest that even if better cognisers find their considerations form a denser web like the second diagram, double-crux amenable ‘sparser’ webs are still common in practice, perhaps due to various non-rational considerations. You also add:
I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first [I think second? - T] image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
This note mirrors a further thought I had (c.f. Ozymandias’s helpful remark in a child about sequence versus cluster thinking). Yet I fear this poses a further worry for the ‘performance issue’ of double crux, as it implies that the existence of cruxes (or double cruxes) may be indicative of pathological epistemic practices. A crux implies something like the following:
You hold some belief B you find important (at least, important enough you think it is worth your time to discuss).
Your credence in B depends closely on some consideration C.
Your credence in C is non-resilient (at least sufficiently non-resilient you would not be surprised to change your mind on it after some not-unduly-long discussion with a reasonable interlocutor).*
* What about cases where one has a resilient credence in C? Then the subsequent worries do not apply. However, I suspect these cases often correspond to “we tried to double crux and we found we couldn’t make progress on resolving our disagreement about theories of truth/normative ethics/some other foundational issue”.
It roughly follows from this you should have low resilience in your credence of B. As you note, this is vulnerable, and knowing one had non-resilient credences in important Bs is to be avoided.
As a tool of diagnosis, double crux might be handy (i.e. “This seems to be a crux for me, yet cruxes aren’t common among elite cognisers—I should probably go check whether they agree this is the crux of this particular matter, and if not maybe see what else they think bears upon B besides C”). Yet (at least per the original exposition) it seems to be more a tool for subsequent ‘treatment’. Doing so could make things worse, not better.
If X and Y find they differ on some crux, but also understand that superior cognisers tend not to have this crux, and distribute support across a variety of considerations, it seems a better idea for them to explore other candidate considerations rather than trying to resolve their disagreement re. C. If they instead do the double-cruxy thing and try and converge on C, they may be led up the epistemic garden path. They may agree with one another on C (thus B), and thus increase their resilience of C (thus B), yet they also confirm a mistaken web of belief around B which wrongly accords too much weight to C. If (as I suggest) at least half the battle on having good ‘all things considered’ attitudes to recondite matters comprises getting the right weights for relevant considerations on the matter, double crux may celebrate them converging further away from the truth. (I take this idea to be expressed in kernel in Ozymandias’s worry of double crux displacing more-expectedly-accurate cluster thinking with less-expectedly-accurate sequence thinking).
How good is double crux at ‘levelling people up at rationality’
The substantial independence of the ‘performance issue’ from the pedagogical issue’
In the same way practising scales may not be the best music, but make one better at playing music, double crux may not be the best discussion technique, but make one better at discussions. This seems fairly independent of its ‘object level performance’ (although I guess if the worry above is on the right track, we would be very surprised if a technique that on the object level leads beliefs to track truth more poorly nonetheless has a salutatory second-order effect).
Thus comparisons to practices of elite philosophers (even if they differ) are inapposite—especially, as I understand from one of them, the sort of superior pattern I observe occurs only at a far right tail even among philosophers (i.e. ‘world-class’ as you write, rather then ‘good’, as I write in the OP). It is obviously a great boon if I could get some fraction more like someone like Askell or Shulman without either their profound ability or the time they have invested in these practices.
On demurring the ‘double crux challenge’
I regret I don’t think it would be hugely valuable to ‘try double crux’ with an instructor in terms of resolving this disagreement. One consideration (on which more later) is that conditional on me not being persuaded by a large group of people who self-report double crux is great, I shouldn’t change my mind (for symmetry reasons) if this number increases by one other person, or it increases by including me. Another is that the expected yield may not be great, at least in one direction: although I hope I am not ‘hostile’ to double crux, it seems one wouldn’t be surprised if it didn’t work with me, even if its generally laudable.
Yet I hope I am not quite as recalcitrant as ‘I would not believe until I felt the stigmata with my own hands’. Apart from a more publicly legible case (infra), I’m a bit surprised at the lack of ‘public successes’ of double cruxing (although this may confuse performance versus pedagogy). In addition to Constantin, Raemon points to their own example with gjm. Maybe I’m only seeing what I want to, but I get a similar impression. They exhibit a variety of laudable epistemic practices, but I don’t see a crux or double crux (what they call ‘cruxes’ seem to be more considerations they take to be important).
The methods of rational self-evaluation
You note a head-to-head comparison between double crux and an approximate sham-control seemed to favour double crux. This looks like interesting data, and it seems a pity it emerges in the depths of a comment thread (ditto the ‘large n of successes’) rather than being written up and presented—it seems unfortunate that the last ‘public evaluation report’ is about 2 years old. I would generally urge trying to produce more ‘public evidence’ rather than the more private “we’ve generally seen this work great (and a large fraction of our alums agree!)”
I recognise that “Provide more evidence to satisfy outside sceptics” should not be high on CFAR’s priority list. Yet I think it is instrumental to other important goals instead. Chiefly: “Does what we are doing actually work?”
You noted in your initial reply undercutting considerations to the ‘we have a large n of successes’, yet you framed this in way that these would often need to amount to a claim of epistemic malice (i.e. ‘either CFAR is lying or participants are being socially pressured into reporting a falsehood’). I don’t work at a rationality institute or specialise in rationality, but on reflection I find this somewhat astonishing. My impression of cognitive biases were that they were much more insidious, that falling prey to them was the rule rather than the exception, and that sincere good faith was not adequate protection (is this not, in some sense, what CFAR casus belli is predicated upon?)
Although covered en passant, let me explicitly (although non-exhaustively) list things which might bias more private evidence of the type CFAR often cites:
CFAR staff (collectively) are often responsible for developing the interventions they hope will improve rationality. One may expect them to be invested in them, and more eager to see that they work than see they don’t (c.f. why we prefer double-blinding over single-blinding).
Other goods CFAR enjoys (i.e. revenue/funding, social capital) seem to go up the better the results of their training. Thus CFAR staff have a variety of incentives pushing them to over-report how good their ‘product’ is (c.f. why conflicts of interest are bad, the general worries about pharma-funded drug trials).
Many CFAR participants have to spend quite a lot of money (i.e. fees and travel) to attend a workshop. They may fear looking silly if it turns out after all this it didn’t do anything, and so incentivised to assert it was much more helpful than it actually was (c.f. choice supportive bias).
There are other aspects of CFAR workshops that participants may enjoy independent of the hoped-for improvement of their rationality (e.g. hanging around interesting people like them, personable and entertaining instructors, romantic entanglements). This extraneous benefits may nonetheless bias upwards their estimate of how effective CFAR workshops are at improving their rationality (c.f. halo effect).
I am sure there are quite a few more. One need not look that hard to find lots of promising studies supporting a given intervention undermined by any one of these.
The reference class of interventions with “a large corpus of (mainly self-reported) evidence of benefit, but susceptible to these limitations” is dismal. It includes many branches of complementary medicine. It includes social programs (e.g. ‘scared straight’) that we now know to be extremely harmful. It includes a large number of ineffective global poverty interventions. Beyond cautionary tales, I aver these approximate the modal member of the class: when the data is so subjective, and the limitations this severe, one should expect the thing in question doesn’t actually work after all.
I don’t think this expectation changes when we condition on the further rider “And the practicioners really only care about the truth re. whether the intervention works or not.” What I worry about going on under the hood is a stronger (and by my lights poorly substantiated) claim of rationalist exceptionalism: “Sure, although cognitive biases plague entire fields of science and can upend decades of results, and we’re appropriately quick to point out risk of bias of work done by outsiders, we can be confident that as we call ourselves rationalists/we teach rationality/we read the sequences/etc. we are akin to Penelope refusing her army of suitors—essentially incorruptible. So when we do similarly bias-susceptible sorts of things, we should give one another a pass.”
I accept ‘gold standard RCTs’ are infeasible (very pricey, and how well can one really do ‘sham CFAR’?) yet I aver there is quite a large gap between this ideal of evidence and the actuality (i.e. evidence kept in house, and which emerges via reference in response to criticism) which could be bridged by doing more write-ups, looking for harder metrics that put one more reliably in touch with reality, and so on. I find it surprisingly incongruent that the sort of common cautions about cognitive biases—indeed, common cautions that seem predicates for CFAR’s value proposition (e.g. “Good faith is not enough”, “Knowing about the existence of biases does not make one immune to them”, Feynmann’s dictum about ‘you are the easiest person to fool’), are not reflected in its approach to self-evaluation.
If nothing else, opening up more of CFAR’s rationale, evidence, etc. to outside review may allow more benefits of outside critique. Insofar as it is the case you found this exchange valuable, one may anticipate greater benefit from further interaction with higher-quality sceptics.
I think I feel similar to lahwran. You made a lot of good points, but the comment feels aggressive in a way that would make me feel surprised if the discussion following this comment would be good. Not downvoting or upvoting either way because of this.
Sensible; have been going through and making edits to reduce aggressiveness (e.g. removing italics, correcting typical-mind fallacies, etc.) I like having these comments here as a record of what was there before edits occurred.
I would argue that Thrasymachus’ initial post also carried an undertone of aggression (that Duncan may have picked up on, either consciously or subconsciously), but that this was possibly obscured and/or distracted from by its very formal tone.
(Whether you prefer veiled or explicit aggression is a “pick your poison” kind of choice.)
I upvoted you, then changed my mind about doing so because of intense emotional content. From both sides, this feels like a fight. I have also retracted my vote on the main post.
I agree that you have good points, but I don’t feel able to engage with them without it feeling like fighting/like tribal something or other.
Thanks for both your policy and your honesty. The bind I feel like I’m in is that, in this case, the way I’d back away from a fight and move myself toward productive collaboration is to offer to double crux, and it seems like in this case that would be inappropriate/might be received as itself a sort of sneaky status move or an attempt to “win.”
If Thrasymachus or anyone else has specific thoughts on how best to engage, I commit to conforming to those thoughts, as a worthwhile experiment. I am interested in the actual truth of the matter, and most of my defensiveness centers around not wanting to throw away the accumulated value we have so far (as opposed to something something status something something ownership).
I think, based on my reading of Thrasymachus’s post, that they think there’s a reasonable generalization of double crux that has succeeded in the real world; that it’s too hard to get to that generalization from double crux; but that there is a reasonable way for disagreeing people to engage.
I am censoring further things I want to say, to avoid pushing on the resonance of tribalism-fighting.
I am censoring further things I want to say, to avoid pushing on the resonance of tribalism-fighting.
Out of curiosity, do you think that inserting an explicit disclaimer like this helps to reduce feelings of tribal offense? If so, having now written such a disclaimer, do you think it would be worth it to share more of your thoughts on the matter?
(I’ll be honest; my main motivator for asking this is because I’m curious and want to read the stuff you didn’t say. But even taking that into consideration, it seems to me that the questions I asked have merit.)
I like most of this; it seems like the sort of post that’s going to lead to significant improvements in people’s overall ability to do collaborative truth-seeking, because it makes concrete and specific recommendations that overall seem useful and sane.
However,
and similar make me wish that posts like this would start by getting curious about CFAR’s “n” of several hundred, rather than implicitly treating it as irrelevant. We’ve been teaching double crux at workshops for a couple of years now, and haven’t stopped the way we’ve stopped with other classes and concepts that weren’t pulling their weight.
My sense is that the combined number of all of the double-crux-doubters and double-crux-strugglers still does not approach, in magnitude, the number of people who have found double crux moderately-to-very useful and workable and helpful (and, in specific, the number of people who have been surprised to discover that double cruxes do in fact exist and are findable an order of magnitude more often than one would have naively guessed).
It does not distress me to consider that double crux might be imperfect, or even sufficiently broken that it should be thrown out.
It does distress me when claims about its imperfectness or brokenness fail to address the existing large corpus of data about its usefulness. Posts like these seem to me to be biased toward the perspective of people who have tried to pick it up online and piece it together, and to discount the several hundred people who’ve been taught it at workshops (well over a thousand if you count crash courses at e.g. EA conferences). At the very least, claim that CFAR is lying or that its participants are socially pressured into reporting a falsehood—don’t just ignore the data outright.
(Edit: I consider these to be actually reasonable things to posit; I would want them to be posited politely and falsifiably if possible, but I think it’s perfectly okay for people to have, as hypotheses, that CFAR’s deceiving itself or others, or that people’s self-report data is unduly swayed by conformity pressures. The hypotheses themselves seem inoffensive to me, as long as they’re investigated soberly instead of tribally.)
=( Sorry for the strength of language, here (I acknowledge and own that I am a bit triggered), but it feels really important to me that we have better epistemic hygiene surrounding this question. The contra double crux argument almost always comes most strongly from people who haven’t actually talked to anyone skilled in double crux or haven’t actually practiced it in an environment where facilitators and instructors could help them iterate in the moment toward an actually useful set of mental motions.
(And no, those aren’t the only people arguing contra double crux. Plenty of actual CFAR grads are represented by posts like the above; plenty of actual CFAR grads struggled and asked for help and still walked away confused, without a useful tool. But as far as I can tell, they are in a plurality, not a majority.)
It seems critical to me that we be the type of community that can distinguish between “none of CFAR’s published material is sufficient to teach this technique on its own” and “this technique isn’t sufficiently good to keep iterating on.” The former is almost trivially true at this point, but that’s weak evidence at best for the latter, especially if you take seriously (repeated for emphasis) the fact that CFAR has a higher n on this question than literally anybody else, as far as I know.
Scattered other thoughts:
As far as I can tell, none of the most serious doubters in the previous thread actually stepped up to try out double crux with any of the people who claimed to be skilled and capable in it. I know there was an additional “post disagreements and we’ll practice” thread, but that seemed to me to contain low density of doubters giving it a serious try with proficient partners.
Also, the numbered description in the “what mostly happens” is, according to me, something like an 85% overlap for double crux. It is a set of motions that the double crux class and tips point people toward and endorse. So that leaves me in confuséd disagreement with the claim that it is “perhaps possible” for double crux to be twisted around to include them—from my point of view, it already does. Which again points to a problem with the explanation and dissemination of the technique, but not with the core technique itself, as transmitted to real humans over the course of a couple of hours of instruction with facilitators on hand*. This causes me to wonder whether the real crux is the jargon problem that you alluded to near the end, with people being upset because they keep crystallizing specific and inaccurate impressions of what double crux is, doing things that seem better, and getting angry at double crux for not being those better things (when in at least many of the cases, it is, and the problem is one of semantics).
* Speaking of “a couple of hours of instruction with facilitators on hand,” I’m also a little sad about what I read as an implicit claim that “well, double crux doesn’t work as well as being the sort of philosopher or debater who’s spent thousands upon thousands of hours practicing and looking at the nature of truth and argument and has been educated within a solid culture with a long tradition.” It seems to me that double crux can take a 1-4 hour investment of time and attention and jumpstart people to being something like a quarter as good as those lifelong philosophers.
And I think that’s amazing, and super promising, and that it’s disingenuous to point at literal world-class professionals and say “what they do is better.” It strikes me as true, but irrelevant? It’s like if I offered an afternoon martial arts seminar that had 50% of its participants walking away with blue-belt level skill (and the other 50% admittedly lost or confused or unimpressed) and someone criticized it for not being as useful or correct as what black belts with ten years of experience do. It’s still an afternoon seminar to get excited about, and for outsiders looking in to wonder “what’s the secret sauce?”
(I realize now that I’m making a claim that may not have been published before, about double crux being intended to be a scrappy small-scale bootstrapping technique, and not necessarily the final step in one’s productive-disagreement evolution. That’s a novel claim, and one that other CFAR staff might disagree with me on, so I retract the part of my grrrr that was based on you not-taking-into-account-this-thing-you-couldn’t-possibly-have-known-about).
Here’s my sense of the bar that putative double crux replacement needs to meet, because I claim double crux is already reliably meeting it:
Be explainable within 30-60 minutes in a deliberately scaffolded educational context
Be meaningfully practicable within another 30-60 minutes
Be sticky enough that people in fact desire to reference it and spread it and use it in the future
Provide a significant boost in productivity-in-debate and/or a corresponding significant reduction in wasted time/antagonism à la giving people 5-25% of the skill that you see in those philosophers
Do all of the above for greater than 33% of people who try it
Here’s my sense of what is unfairly and inaccurately being expected of double crux:
Be grokkable by people based entirely on text and hearsay/be immune to problems that arise from [games of telephone] or [imperfect capture and reconstruction through writing].
Be “attractive” in the sense that people fumbling around on their own will necessarily find themselves moving closer toward the right motions rather than sideways or away, without help or guidance.
Be “complete” in the sense that it contains all of of what we know about how to improve disagreement and/or debate.
Provide an efficacy boost of greater than 50% for greater than 75% of the people who try it.
Here are my “cruxes” on this question:
I would drastically reduce my support for double crux as a technique if it turned out that what was needed was something that could be asynchronously transmitted (e.g. because in-person instruction insufficiently scales).
I would drastically reduce my support for double crux if it turned out that 50+% of the people who reported valuable experiences in my presence later on discovered that the knowledge didn’t stick or that the feeling had been ephemeral (e.g. possibly social-conformity based rather than real).
I would drastically reduce my support for double crux if I attempted to convey my True Scotsman version to four high-quality skeptics/doubters (such as Thrasymachus) and all four afterward told me that all of the nuance I thought I was adding was non-useful, or had already been a part of their model and was non-novel.
I would drastically reduce my support for double crux if a concrete alternative that was chunkable, grokkable, teachable, and had thing-nature (as opposed to being a set of principles that are hard to operationalize) was proposed and was promising after its first iterations with 30+ people.
tl;dr I claim that most of the good things you point at are, in fact, reasonably captured by double crux and have been all along; I claim that most of the bad things you point at are either due to insufficient pedagogy (our fault) or are likely to be universal and bad for any attempt to create a technique in this space. I recognize that I’m in danger of sliding into No True Scotsman territory and I nevertheless stick to my claim based on having done this with hundreds of people, which is something most others have not had the opportunity to do.
As a datapoint, this is the first time that I remember hearing that there would exist “a very large corpus of data about its usefulness”. The impression I got from the original DC post was that this was popular among CFAR’s instructors, but that you’d been having difficulties effectively teaching this to others.
I think that if such a corpus of evidence exists, then the main reason why people are ignoring it is because the existence of this corpus hasn’t been adequately communicated, making the implicit accusation of “your argument isn’t taking into account all the data that it should” unfair.
That’s sensible. I would have thought it was implied by “CFAR’s taught this at every workshop and event that it’s run for the past two years,” but I now realize that’s very typical-mind-fallacy of me.
Where’s that fact from? It wasn’t in the original DC post, which only said that the technique “is one of CFAR’s newer concepts”.
That’s what I mean by typical mind fallacy. I live in the universe where it’s obvious that double crux is being taught and tinkered with constantly, because I work at CFAR, and so I just stupidly forgot that others don’t have access to that same knowledge. i.e. I was elaborating on my agreement with you, above.
Also, by “newer concept” we just mean relative to CFAR’s existence since 2012. It’s younger than e.g. inner sim, or TAPs, but it’s been a part of the curriculum since before my hiring in October 2015.
Also also in consolidating comments I have discovered that I lack the ability to delete empty comments.
Ah, gotcha.
I upvoted this entire chain of comments for the clear and prosocial communication displayed throughout.
I’ve just been going around and deleting the empty comments. Right now we don’t allow users to delete comments, since that would also delete all children comments by other authors (or at least make them inaccessible). Probably makes sense for people to be able to delete their own comments if they don’t have any children.
Hello Duncan,
My thanks for your reply. I apologise if my wording in the OP was inflammatory or unnecessarily ‘triggering’ (another commenter noted an ‘undertone of aggression’, which I am sorry for, although I promise it wasn’t intended—you are quoted repeatedly as you wrote the canonical exposition for what I target in the OP, rather than some misguided desire to pick a fight with you on the internet). I hope I capture the relevant issues below, but apologies in advance if I neglect or mistake any along the way.
CFAR’s several hundred and the challenge of insider evidence
I was not aware of the of the several hundred successes CFAR reports of double crux being used ‘in the wild’. I’m not entirely sure whether the successes are a) those who find double crux helpful or b) particular instances of double crux resolving disagreement, but I think you would endorse plenty of examples of both. My pretty sceptical take on double crux had ‘priced in’ the expectation of CFAR instructors at least some/many alums thought it was pretty nifty.
You correctly anticipate the sort of worries I would have about this sort of evidence. Self-reported approbation from self-selected participants is far from robust. Given branches of complementary medicine can probably tout thousands to millions of ‘positive results’ and happy customers, yet we know it is in principle intellectually bankrupt, and in practice performs no better than placebo in properly conducted trials. (I regret to add replies along the lines of, “if you had received the proper education in the technique you’d—probably—see it works well”, or “I’m a practicioner with much more experience than any doubter in terms of using this, and it works in my experience” also have analogies here).
I don’t think one need presume mendacity on the part of CFAR, nor gullibility on the part of workshop attendees, to nonetheless believe this testimonial evidence isn’t strongly truth-tracking: one may anticipate similarly positive reports in worlds where (perhaps) double crux doesn’t really work, but other stuff CFAR practices does work, and participants enjoy mingling with similarly rationally minded participants, may have had to invest 4 figure sums to get on the workshop, and so on and so forth. (I recall CFAR’s previous evaluation had stupendous scores on self-reported measures, but more modest performance on objective metrics).
Of course, unlike complementary medicine, double crux does not have such powerful disconfirmation as ‘violates known physics’ or ‘always fails RCTs’. Around the time double crux was proposed I challenged double crux on theoretical grounds (i.e. double cruxes should be very rare), this post was prompted by some of the dissonance on previous threads, but also the lack of public examples of double crux working. Your experience of the success of double crux in workshops is essentially private evidence (at least for now): in the same way it is hard to persuade me of its validity, it is next to impossible for me to rebut it. I nonetheless hope other lines of inquiry are fruitful.
How sparse is a typical web of belief?
One is the theoretical point. I read in your reply disagreement on the ‘double cruxes’ should be rare point (”… the number of people who have been surprised to discover that double cruxes do in fact exist and are findable an order of magnitude more often than one would have naively guessed”). Although you don’t include it as a crux in your reply, it looks pretty crucial to me. If cruxes are as rare as I claim, double cruxing shouldn’t work, and so the participant reports are more likely to have been innocently mistaken.
In essence, I take the issue to be around what web of belief surrounding a typical subject of disagreement looks like. It seems double crux is predicated on this being pretty sparse (at least in terms of important considerations): although lots of beliefs might have some trivial impact on your credence B, B is mainly set by small-n cruxes (C), which are generally sufficient to change ones mind if ones attitude towards them changes.
By contrast, I propose the relevant web tends to be much denser (or, alternatively, the ‘power’ of the population of reasons that may alter ones credence in B is fairly evenly distributed). Credence in B arises from a large number of considerations that weigh upon it, each of middling magnitude. Thus even if I am persuaded one is mistaken, my credence in B does not change dramatically. It follows that ‘cruxes’ are rare, and so two people happening to discover their belief on some recondite topic B is principally determined by some other issue (C), and it is the same for both of them is rare.
This is hard to make very crisp, as (among others) the ‘space of all topics reasonable people disagree’ are hard to pin down. Beyond appeals to my own experience and introspection (“do you really find your belief in (let’s say, some political view like gay marriage or abortion) depends on a single consideration to such a degree that, if it was refuted, you would change your view?) I’d want to marshal a couple of other considerations.
When one looks at a topic in philosophy, or science, or many other fields of enquiry, one usually sees a very one-to-many relationship of the topic to germane considerations. A large number of independent lines of evidence support the theory of evolution; a large number of arguments regarding god’s existence in philosophy receive scrutiny (and in return they spawn a one-to-many relationship of argument to objections, objection to counter-objections). I suggest this offers analogical evidence in support of my thesis.
Constantin’s report of double cruxing (which has been used a couple of times as an exemplar in other threads) seems to follow the pattern I expect. I struggle to identify a double-crux in the discussion Constatin summarizes: most of the discussion seems to involve whether Salvatier’s intellectual project is making much progress, with then a host of subsidiary considerations (e.g. how much to weigh ‘formal accomplishments’, the relative value of more speculative efforts on far future considerations, etc.), but it is unclear to me if Constantin was persuaded Salvatier’s project was making good progress this would change her mind about the value of the rationalist intellectual community (after all, one good project may not be adequate ‘output’) or vice versa (even if Salvatier recognises his own project was not making good progress, the rationality community might still be a fertile ground to cultivate his next attempt, etc.)
What comprises double-crux?
I took the numbered list of my counter-proposal to have 25% overlap with double crux (i.e. realising your credences vary considerably), not 85%. Allow me to be explicit on how I see 2-4 in my list are in contradistinction to the ‘double crux algorithm’:
There’s no assumption of an underlying single ‘crux of the matter’ between participants, or for either individually.
There’s no necessity for a given consideration (even the strongest identified) to be individually sufficient to change ones mind about B
There’s also no necessity for the strongest considerations proposed by X and Y to have common elements.
There’s explicit consideration of credence resilience. Foundational issues may by ‘double cruxes’ in that (e.g.) my views on most applied ethics questions would change dramatically if I were persuaded of the virtue ethics my interlocutor holds, but one often makes more progress discussing a less resilient non-foundational claim even if ‘payoff’ in terms of the subsequent credence change in the belief of interest is lower.
This may partly be explained by a broader versus narrower conception of double crux. I take the core idea of double crux to be the ‘find some C for which your disagreement over B relies upon, then discuss C’ (this did, in my defense, comprise the whole of the ‘how to play’ section in the initial write-up). I take you to holding a broader view, where double crux incorporates other related epistemic practices, and it has value in toto.
My objection is expressly this. Double crux is not essential for these incorporated practices. So one can compare discussion with the set of these other practices to this set with the addition of double crux. I aver the set sans double crux will lead to better discussions.
Pedagogy versus performance
I took double crux was mainly being proposed as a leading strategy to resolve disagreement. Hence the comparison to elite philosophers was to suggest it wasn’t a leading strategy by pointing to something better. I see from this comment (and the one you split off into its own thread) you see it more as a more a pedagogical role—even if elite performers do something different, it does valuable work in improving skills. Although I included a paragraph about its possible pedagogical value (admittedly one you may have missed as I started it with a self-indulgent swipe at the rationalist community), I would have focused more on this area had I realised it was CFAR’s main contention.
I regret not to surprise you with doubts about the pedagogical value as well. This mostly arises from the above concerns: if double cruxes are as rare as I propose, it is unclear how searching for them is that helpful an exercise. A related worry (related to the top of the program) is this seems to entail increasing reliance on private evidence regarding whether the technique works: in principle objections to the ‘face value’ of the technique apply less (as it is there to improve skills rather than a proposal for what the ‘finished article’ should look like); adverse reports from non-CFAR alums don’t really matter (you didn’t teach them, so it is no surprise they don’t get it right). What one is left with is the collective impressions of instructors, and the reports of the students.
I guess I have higher hopes for transparency and communicability of ‘good techniques’. I understand CFAR is currently working on further efforts to evaluate itself. I hope to be refuted by the forthcoming data.
I want to bring up sequence thinking and cluster thinking, which I think are useful in understanding the disagreement here. As I understand it, Duncan argues that sequence thinking is more common than cluster thinking, and you’re arguing the converse.
I think most beliefs can be put in either a cluster-thinking or a sequence-thinking framework. However, I think that (while both are important and useful) cluster thinking is generally more useful for coming up with final conclusions. For that reason, I’m suspicious of double crux, because I’m worried that it will cause people to frame their beliefs in a sequence-thinking way and feel like they should change their beliefs if some important part of their sequence was proven wrong, even though (I think) using cluster thinking will generally get you more accurate answers.
This looks remarkably like an attempt to identify a crux in the discussion. Assuming that you’re correct about double-cruxing being problematic due to encouraging sequence-like thinking: isn’t the quoted sentence precisely the kind of simplification that propagates such thinking? Conversely, if it’s not a simplification, doesn’t that provide (weak) evidence in favor of double-cruxing being a useful tool in addressing disagreements?
I think that sequence thinking is important and valuable (and probably undersupplied in the world in general, even while cluster thinking is undersupplied in the rationalist community in specific). However, I think both Thrasymachus and Duncan are doing cluster thinking here—like, if Duncan were convinced that cluster thinking is actually generally a better way of coming to final decisions, I expect he’d go “that’s weird, why is CFAR getting such good results from teaching double crux anyway?” not “obviously I was wrong about how good double crux is.” Identifying a single important point of disagreement isn’t a claim that it’s the only important point of disagreement.
I like this point a lot, and your model of me is accurate, at least insofar as I’m capable of simming this without actually experiencing it. For instance, I have similar thoughts about some of my cutting/oversimplifying black-or-white heuristics, which seem less good than the shades-of-gray epistemics of people around me, and yet often produce more solid results. I don’t conclude from this that those heuristics are better, but rather that I should be confused about my model of what’s going on.
that makes a ton of sense for theoretically justified reasons I don’t know how to explain yet. anyone want to collab with me on a sequence? I’m a bit blocked on 1. exactly what my goal is and 2. what I should be practicing in order to be able to write a sequence (given that I’m averse to writing post-style content right now)
Naturally, and I wasn’t claiming it was. That being said, I think that when you single out a specific point of disagreement (without mentioning any others), there is an implication that the mentioned point is, if not the only point of disagreement, then at the very least the most salient point of disagreement. Moreover, I’d argue that if Duncan’s only recourse after being swayed regarding sequence versus cluster thinking is “huh, then I’m not sure why we’re getting such good results”, then there is a sense in which sequence versus cluster thinking is the only point of disagreement, i.e. once that point is settled, Duncan has no more arguments.
(Of course, I’m speaking purely in the hypothetical here; I’m not trying to make any claims about Duncan’s actual epistemic state. This should be fairly obvious given the context of our discussion, but I just thought I’d throw that disclaimer in there.)
Oh, hmm, this is Good Point Also.
First off, a symmetric apology for any inflammatory or triggering nature in my own response, and an unqualified acceptance of your own, and reiterated thanks for writing the post in the first place, and thanks for engaging further. I did not at any point feel personally attacked or slighted; to the degree that I was and am defensive, it was over a fear that real value would be thrown out or socially disfavored for insufficient reason.
(I note the symmetrical concern on your part: that real input value will be thrown out or lost by being poured into a socially-favored-for-insufficient-reason framework, when other frameworks would do better. You are clearly motivated by the Good.)
You’re absolutely right that the relative lack of double cruxes ought be on my list of cruxes. It is in fact, and I simply didn’t think of it to write it down. I highly value double crux as a technique if double cruxes are actually findable in 40-70% of disagreements; I significantly-but-not-highly value double crux if double cruxes are actually findable in 25-40% of disagreements; I lean toward ceasing to investigate double crux if they’re only findable in 10-25%, and I am confused if they’re rarer than 10%.
I agree that this is a relevant place to investigate, and at the risk of proving you right at the start, I add it to my list of things which would cause me to shift my belief somewhat.
The claim that I derive from “there’s surprisingly often one crux” is something like the following: that, for most people, most of the time, there is not in fact a careful, conscious, reasoned weighing and synthesis of a variety of pieces of evidence. That, fompmott, the switch from “I don’t believe this” to “I now believe this” is sudden rather than gradual, and, post-switch, involves a lot of recasting of prior evidence and conclusions, and a lot of further confirmation-biased integration of new evidence. That, fompmott, there are a lot of accumulated post-hoc justifications whose functional irrelevance may not even be consciously acknowledged, or even safe to acknowledge, but whose accumulation is strongly incentivized given a culture wherein a list of twenty reasons is accorded more than 20x the weight of a list of one reason, even if nineteen of those twenty reasons are demonstrated to be fake (e.g. someone accused of sexual assault, acquitted due to their ironclad alibi that they were elsewhere, and yet the accusation still lingers because of all the sticky circumstantial bits that are utterly irrelevant).
In short, the idealized claim of double crux is that people’s belief webs look like this:
<insert>
Whereas I read you claiming that people’s belief webs look like this:
<insert>
And on reflection and in my experience, the missing case that tilts toward “double crux is surprisingly useful” is that a lot of belief webs look like this:
<insert>
… where they are not, in fact, simplistic and absolutely straightforward, but there often is a crux which far outweighs all of the other accumulated evidence.
I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
If I’m reading you right, this takes care of your first bullet point above entirely and brings us closer to a mutual understanding on your second bullet point. Your third bullet point remains entirely unaddressed in double crux except by the fact that we often have common cultural pressures causing us to have aligned-or-opposite opinions on many matters, and thus in practice there’s often overlap. Your fourth bullet point seems both true and a meaningful hole or flaw in double crux in its idealized, Platonic form, but also is an objection that in practice is rather gracefully integrated by advice to “keep ideals in mind, but do what seems sane and useful in the moment.”
To the extent that those sections of your arguments which miss were based on my bad explanation, that’s entirely on me, and I apologize for the confusion and the correspondingly wasted time (on stuff that proved to be non-crucial!). I should further clarify that the double crux writeup was conceived in the first place as “well, we have a thing that works pretty well when transmitted in person, but people keep wanting it not transmitted in person, partly because workshops are hard to get to even though we give the average EA or rationalist who can’t afford it pretty significant discounts, so let’s publish something even though it’s Not Likely To Be Good, and let’s do our best to signal within the document that it’s incomplete and that they should be counting it as ‘better than nothing’ rather than judging it as ‘this is the technique, and if I’m smart and good and can’t do it from reading, then that’s strong evidence that the technique doesn’t work for me.’” I obviously did not do enough of that signaling, since we’re here.
Re: the claim “Double crux is not essential for these incorporated practices.” I agree wholeheartedly on the surface—certainly people were doing good debate and collaborative truthseeking for millennia before the double crux technique was dreamed up.
I would be interested in seeing a side-by-side test of double crux versus direct instruction in a set of epistemic debate principles, or double crux versus some other technique that purports to install the same virtues. We’ve done some informal testing of this within CFAR—in one workshop, Eli Tyre and Lauren Lee taught half the group double crux as it had always previously been taught, while I discussed with the other half all of the ways that truthseeking conversations go awry, and all of the general desiderata for a positive, forward-moving experience. As it turned out, the formal double crux group did noticeably better when later trying to actually resolve intellectual disagreement, but the strongest takeaway we got from it was that the latter group didn’t have an imperative to operationalize their disagreement into concrete observations or specific predictions, which seems like a non-central confound to the original question.
As for “I guess I have higher hopes for transparency and communicability of ‘good techniques’,” all I can do is fall back yet again on the fact that, every time skepticism of double crux has reared its head, multiple CFAR instructors and mentors and comparably skilled alumni have expressed willingness to engage with skeptics, and produce publicly accessible records and so forth. Perhaps, since CFAR’s the one claiming it’s a solid technique, 100% of the burden of creating such referenceable content falls on us, but one would hope that the relationship between enthusiasts and doubters is not completely antagonistic, and that we could find some Robin Hansons to our Yudkowskys, who are willing to step up and put their skepticism on the line as we are with our confidence.
As of yet, not a single person has sent me a request of the form “Okay, Duncan, I want to double crux with you about X such that we can write it down or video it for others to reference,” nor has anyone sent me a request of the form “Okay, Duncan, I suspect I can either prove double crux unworth it or prove [replacement Y] a more promising target. Let’s do this in public?”
I really really do want all of us to have the best tool. My enthusiasm for double crux has nothing to do with an implication that it’s perfect, and everything to do with a lack of visibly better options. If that’s just because I haven’t noticed something obvious, I’d genuinely appreciate having the obvious pointed out, in this case.
Thanks again, Thrasymachus.
Thank you for your gracious reply. I interpret a couple of overarching themes in which I would like to frame my own: the first is the ‘performance issue’ (i.e. ‘How good is double crux at resolving disagreement/getting closer to the truth’); the second the ‘pedagogical issue’ (i.e. ‘how good is double crux at the second order task of getting people better at resolving disagreement/getting closer to the truth’). I now better understand you take the main support from double crux to draw upon the latter issue, but I’d also like to press on some topics about the former on which I believe we disagree.
How well does double crux perform?
Your first two diagrams precisely capture the distinction I have in mind (I regret not having thought to draw my own earlier). If I read the surrounding text right (I’m afraid not to know what ‘fompmott’ means, and google didn’t help me), you suggest that even if better cognisers find their considerations form a denser web like the second diagram, double-crux amenable ‘sparser’ webs are still common in practice, perhaps due to various non-rational considerations. You also add:
This note mirrors a further thought I had (c.f. Ozymandias’s helpful remark in a child about sequence versus cluster thinking). Yet I fear this poses a further worry for the ‘performance issue’ of double crux, as it implies that the existence of cruxes (or double cruxes) may be indicative of pathological epistemic practices. A crux implies something like the following:
You hold some belief B you find important (at least, important enough you think it is worth your time to discuss).
Your credence in B depends closely on some consideration C.
Your credence in C is non-resilient (at least sufficiently non-resilient you would not be surprised to change your mind on it after some not-unduly-long discussion with a reasonable interlocutor).*
* What about cases where one has a resilient credence in C? Then the subsequent worries do not apply. However, I suspect these cases often correspond to “we tried to double crux and we found we couldn’t make progress on resolving our disagreement about theories of truth/normative ethics/some other foundational issue”.
It roughly follows from this you should have low resilience in your credence of B. As you note, this is vulnerable, and knowing one had non-resilient credences in important Bs is to be avoided.
As a tool of diagnosis, double crux might be handy (i.e. “This seems to be a crux for me, yet cruxes aren’t common among elite cognisers—I should probably go check whether they agree this is the crux of this particular matter, and if not maybe see what else they think bears upon B besides C”). Yet (at least per the original exposition) it seems to be more a tool for subsequent ‘treatment’. Doing so could make things worse, not better.
If X and Y find they differ on some crux, but also understand that superior cognisers tend not to have this crux, and distribute support across a variety of considerations, it seems a better idea for them to explore other candidate considerations rather than trying to resolve their disagreement re. C. If they instead do the double-cruxy thing and try and converge on C, they may be led up the epistemic garden path. They may agree with one another on C (thus B), and thus increase their resilience of C (thus B), yet they also confirm a mistaken web of belief around B which wrongly accords too much weight to C. If (as I suggest) at least half the battle on having good ‘all things considered’ attitudes to recondite matters comprises getting the right weights for relevant considerations on the matter, double crux may celebrate them converging further away from the truth. (I take this idea to be expressed in kernel in Ozymandias’s worry of double crux displacing more-expectedly-accurate cluster thinking with less-expectedly-accurate sequence thinking).
How good is double crux at ‘levelling people up at rationality’
The substantial independence of the ‘performance issue’ from the pedagogical issue’
In the same way practising scales may not be the best music, but make one better at playing music, double crux may not be the best discussion technique, but make one better at discussions. This seems fairly independent of its ‘object level performance’ (although I guess if the worry above is on the right track, we would be very surprised if a technique that on the object level leads beliefs to track truth more poorly nonetheless has a salutatory second-order effect).
Thus comparisons to practices of elite philosophers (even if they differ) are inapposite—especially, as I understand from one of them, the sort of superior pattern I observe occurs only at a far right tail even among philosophers (i.e. ‘world-class’ as you write, rather then ‘good’, as I write in the OP). It is obviously a great boon if I could get some fraction more like someone like Askell or Shulman without either their profound ability or the time they have invested in these practices.
On demurring the ‘double crux challenge’
I regret I don’t think it would be hugely valuable to ‘try double crux’ with an instructor in terms of resolving this disagreement. One consideration (on which more later) is that conditional on me not being persuaded by a large group of people who self-report double crux is great, I shouldn’t change my mind (for symmetry reasons) if this number increases by one other person, or it increases by including me. Another is that the expected yield may not be great, at least in one direction: although I hope I am not ‘hostile’ to double crux, it seems one wouldn’t be surprised if it didn’t work with me, even if its generally laudable.
Yet I hope I am not quite as recalcitrant as ‘I would not believe until I felt the stigmata with my own hands’. Apart from a more publicly legible case (infra), I’m a bit surprised at the lack of ‘public successes’ of double cruxing (although this may confuse performance versus pedagogy). In addition to Constantin, Raemon points to their own example with gjm. Maybe I’m only seeing what I want to, but I get a similar impression. They exhibit a variety of laudable epistemic practices, but I don’t see a crux or double crux (what they call ‘cruxes’ seem to be more considerations they take to be important).
The methods of rational self-evaluation
You note a head-to-head comparison between double crux and an approximate sham-control seemed to favour double crux. This looks like interesting data, and it seems a pity it emerges in the depths of a comment thread (ditto the ‘large n of successes’) rather than being written up and presented—it seems unfortunate that the last ‘public evaluation report’ is about 2 years old. I would generally urge trying to produce more ‘public evidence’ rather than the more private “we’ve generally seen this work great (and a large fraction of our alums agree!)”
I recognise that “Provide more evidence to satisfy outside sceptics” should not be high on CFAR’s priority list. Yet I think it is instrumental to other important goals instead. Chiefly: “Does what we are doing actually work?”
You noted in your initial reply undercutting considerations to the ‘we have a large n of successes’, yet you framed this in way that these would often need to amount to a claim of epistemic malice (i.e. ‘either CFAR is lying or participants are being socially pressured into reporting a falsehood’). I don’t work at a rationality institute or specialise in rationality, but on reflection I find this somewhat astonishing. My impression of cognitive biases were that they were much more insidious, that falling prey to them was the rule rather than the exception, and that sincere good faith was not adequate protection (is this not, in some sense, what CFAR casus belli is predicated upon?)
Although covered en passant, let me explicitly (although non-exhaustively) list things which might bias more private evidence of the type CFAR often cites:
CFAR staff (collectively) are often responsible for developing the interventions they hope will improve rationality. One may expect them to be invested in them, and more eager to see that they work than see they don’t (c.f. why we prefer double-blinding over single-blinding).
Other goods CFAR enjoys (i.e. revenue/funding, social capital) seem to go up the better the results of their training. Thus CFAR staff have a variety of incentives pushing them to over-report how good their ‘product’ is (c.f. why conflicts of interest are bad, the general worries about pharma-funded drug trials).
Many CFAR participants have to spend quite a lot of money (i.e. fees and travel) to attend a workshop. They may fear looking silly if it turns out after all this it didn’t do anything, and so incentivised to assert it was much more helpful than it actually was (c.f. choice supportive bias).
There are other aspects of CFAR workshops that participants may enjoy independent of the hoped-for improvement of their rationality (e.g. hanging around interesting people like them, personable and entertaining instructors, romantic entanglements). This extraneous benefits may nonetheless bias upwards their estimate of how effective CFAR workshops are at improving their rationality (c.f. halo effect).
I am sure there are quite a few more. One need not look that hard to find lots of promising studies supporting a given intervention undermined by any one of these.
The reference class of interventions with “a large corpus of (mainly self-reported) evidence of benefit, but susceptible to these limitations” is dismal. It includes many branches of complementary medicine. It includes social programs (e.g. ‘scared straight’) that we now know to be extremely harmful. It includes a large number of ineffective global poverty interventions. Beyond cautionary tales, I aver these approximate the modal member of the class: when the data is so subjective, and the limitations this severe, one should expect the thing in question doesn’t actually work after all.
I don’t think this expectation changes when we condition on the further rider “And the practicioners really only care about the truth re. whether the intervention works or not.” What I worry about going on under the hood is a stronger (and by my lights poorly substantiated) claim of rationalist exceptionalism: “Sure, although cognitive biases plague entire fields of science and can upend decades of results, and we’re appropriately quick to point out risk of bias of work done by outsiders, we can be confident that as we call ourselves rationalists/we teach rationality/we read the sequences/etc. we are akin to Penelope refusing her army of suitors—essentially incorruptible. So when we do similarly bias-susceptible sorts of things, we should give one another a pass.”
I accept ‘gold standard RCTs’ are infeasible (very pricey, and how well can one really do ‘sham CFAR’?) yet I aver there is quite a large gap between this ideal of evidence and the actuality (i.e. evidence kept in house, and which emerges via reference in response to criticism) which could be bridged by doing more write-ups, looking for harder metrics that put one more reliably in touch with reality, and so on. I find it surprisingly incongruent that the sort of common cautions about cognitive biases—indeed, common cautions that seem predicates for CFAR’s value proposition (e.g. “Good faith is not enough”, “Knowing about the existence of biases does not make one immune to them”, Feynmann’s dictum about ‘you are the easiest person to fool’), are not reflected in its approach to self-evaluation.
If nothing else, opening up more of CFAR’s rationale, evidence, etc. to outside review may allow more benefits of outside critique. Insofar as it is the case you found this exchange valuable, one may anticipate greater benefit from further interaction with higher-quality sceptics.
I think I feel similar to lahwran. You made a lot of good points, but the comment feels aggressive in a way that would make me feel surprised if the discussion following this comment would be good. Not downvoting or upvoting either way because of this.
Sensible; have been going through and making edits to reduce aggressiveness (e.g. removing italics, correcting typical-mind fallacies, etc.) I like having these comments here as a record of what was there before edits occurred.
Upvoted the top level comment after the edit.
I would argue that Thrasymachus’ initial post also carried an undertone of aggression (that Duncan may have picked up on, either consciously or subconsciously), but that this was possibly obscured and/or distracted from by its very formal tone.
(Whether you prefer veiled or explicit aggression is a “pick your poison” kind of choice.)
This seems correct to me. And I originally didn’t upvote the top-level post either.
I upvoted you, then changed my mind about doing so because of intense emotional content. From both sides, this feels like a fight. I have also retracted my vote on the main post.
I agree that you have good points, but I don’t feel able to engage with them without it feeling like fighting/like tribal something or other.
Thanks for both your policy and your honesty. The bind I feel like I’m in is that, in this case, the way I’d back away from a fight and move myself toward productive collaboration is to offer to double crux, and it seems like in this case that would be inappropriate/might be received as itself a sort of sneaky status move or an attempt to “win.”
If Thrasymachus or anyone else has specific thoughts on how best to engage, I commit to conforming to those thoughts, as a worthwhile experiment. I am interested in the actual truth of the matter, and most of my defensiveness centers around not wanting to throw away the accumulated value we have so far (as opposed to something something status something something ownership).
I think, based on my reading of Thrasymachus’s post, that they think there’s a reasonable generalization of double crux that has succeeded in the real world; that it’s too hard to get to that generalization from double crux; but that there is a reasonable way for disagreeing people to engage.
I am censoring further things I want to say, to avoid pushing on the resonance of tribalism-fighting.
Out of curiosity, do you think that inserting an explicit disclaimer like this helps to reduce feelings of tribal offense? If so, having now written such a disclaimer, do you think it would be worth it to share more of your thoughts on the matter?
(I’ll be honest; my main motivator for asking this is because I’m curious and want to read the stuff you didn’t say. But even taking that into consideration, it seems to me that the questions I asked have merit.)
no, I think it creates a small fraction of what it would if I’d said the thing.