First off, a symmetric apology for any inflammatory or triggering nature in my own response, and an unqualified acceptance of your own, and reiterated thanks for writing the post in the first place, and thanks for engaging further. I did not at any point feel personally attacked or slighted; to the degree that I was and am defensive, it was over a fear that real value would be thrown out or socially disfavored for insufficient reason.
(I note the symmetrical concern on your part: that real input value will be thrown out or lost by being poured into a socially-favored-for-insufficient-reason framework, when other frameworks would do better. You are clearly motivated by the Good.)
You’re absolutely right that the relative lack of double cruxes ought be on my list of cruxes. It is in fact, and I simply didn’t think of it to write it down. I highly value double crux as a technique if double cruxes are actually findable in 40-70% of disagreements; I significantly-but-not-highly value double crux if double cruxes are actually findable in 25-40% of disagreements; I lean toward ceasing to investigate double crux if they’re only findable in 10-25%, and I am confused if they’re rarer than 10%.
By contrast, I propose the relevant web tends to be much denser (or, alternatively, the ‘power’ of the population of reasons that may alter ones credence in B is fairly evenly distributed). Credence in B arises from a large number of considerations that weigh upon it, each of middling magnitude. Thus even if I am persuaded one is mistaken, my credence in B does not change dramatically. It follows that ‘cruxes’ are rare, and so two people happening to discover their belief on some recondite topic B is principally determined by some other issue (C), and it is the same for both of them is rare.
I agree that this is a relevant place to investigate, and at the risk of proving you right at the start, I add it to my list of things which would cause me to shift my belief somewhat.
The claim that I derive from “there’s surprisingly often one crux” is something like the following: that, for most people, most of the time, there is not in fact a careful, conscious, reasoned weighing and synthesis of a variety of pieces of evidence. That, fompmott, the switch from “I don’t believe this” to “I now believe this” is sudden rather than gradual, and, post-switch, involves a lot of recasting of prior evidence and conclusions, and a lot of further confirmation-biased integration of new evidence. That, fompmott, there are a lot of accumulated post-hoc justifications whose functional irrelevance may not even be consciously acknowledged, or even safe to acknowledge, but whose accumulation is strongly incentivized given a culture wherein a list of twenty reasons is accorded more than 20x the weight of a list of one reason, even if nineteen of those twenty reasons are demonstrated to be fake (e.g. someone accused of sexual assault, acquitted due to their ironclad alibi that they were elsewhere, and yet the accusation still lingers because of all the sticky circumstantial bits that are utterly irrelevant).
In short, the idealized claim of double crux is that people’s belief webs look like this:
And on reflection and in my experience, the missing case that tilts toward “double crux is surprisingly useful” is that a lot of belief webs look like this:
… where they are not, in fact, simplistic and absolutely straightforward, but there often is a crux which far outweighs all of the other accumulated evidence.
I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
If I’m reading you right, this takes care of your first bullet point above entirely and brings us closer to a mutual understanding on your second bullet point. Your third bullet point remains entirely unaddressed in double crux except by the fact that we often have common cultural pressures causing us to have aligned-or-opposite opinions on many matters, and thus in practice there’s often overlap. Your fourth bullet point seems both true and a meaningful hole or flaw in double crux in its idealized, Platonic form, but also is an objection that in practice is rather gracefully integrated by advice to “keep ideals in mind, but do what seems sane and useful in the moment.”
To the extent that those sections of your arguments which miss were based on my bad explanation, that’s entirely on me, and I apologize for the confusion and the correspondingly wasted time (on stuff that proved to be non-crucial!). I should further clarify that the double crux writeup was conceived in the first place as “well, we have a thing that works pretty well when transmitted in person, but people keep wanting it not transmitted in person, partly because workshops are hard to get to even though we give the average EA or rationalist who can’t afford it pretty significant discounts, so let’s publish something even though it’s Not Likely To Be Good, and let’s do our best to signal within the document that it’s incomplete and that they should be counting it as ‘better than nothing’ rather than judging it as ‘this is the technique, and if I’m smart and good and can’t do it from reading, then that’s strong evidence that the technique doesn’t work for me.’” I obviously did not do enough of that signaling, since we’re here.
Re: the claim “Double crux is not essential for these incorporated practices.” I agree wholeheartedly on the surface—certainly people were doing good debate and collaborative truthseeking for millennia before the double crux technique was dreamed up.
I would be interested in seeing a side-by-side test of double crux versus direct instruction in a set of epistemic debate principles, or double crux versus some other technique that purports to install the same virtues. We’ve done some informal testing of this within CFAR—in one workshop, Eli Tyre and Lauren Lee taught half the group double crux as it had always previously been taught, while I discussed with the other half all of the ways that truthseeking conversations go awry, and all of the general desiderata for a positive, forward-moving experience. As it turned out, the formal double crux group did noticeably better when later trying to actually resolve intellectual disagreement, but the strongest takeaway we got from it was that the latter group didn’t have an imperative to operationalize their disagreement into concrete observations or specific predictions, which seems like a non-central confound to the original question.
As for “I guess I have higher hopes for transparency and communicability of ‘good techniques’,” all I can do is fall back yet again on the fact that, every time skepticism of double crux has reared its head, multiple CFAR instructors and mentors and comparably skilled alumni have expressed willingness to engage with skeptics, and produce publicly accessible records and so forth. Perhaps, since CFAR’s the one claiming it’s a solid technique, 100% of the burden of creating such referenceable content falls on us, but one would hope that the relationship between enthusiasts and doubters is not completely antagonistic, and that we could find some Robin Hansons to our Yudkowskys, who are willing to step up and put their skepticism on the line as we are with our confidence.
As of yet, not a single person has sent me a request of the form “Okay, Duncan, I want to double crux with you about X such that we can write it down or video it for others to reference,” nor has anyone sent me a request of the form “Okay, Duncan, I suspect I can either prove double crux unworth it or prove [replacement Y] a more promising target. Let’s do this in public?”
I really really do want all of us to have the best tool. My enthusiasm for double crux has nothing to do with an implication that it’s perfect, and everything to do with a lack of visibly better options. If that’s just because I haven’t noticed something obvious, I’d genuinely appreciate having the obvious pointed out, in this case.
Thank you for your gracious reply. I interpret a couple of overarching themes in which I would like to frame my own: the first is the ‘performance issue’ (i.e. ‘How good is double crux at resolving disagreement/getting closer to the truth’); the second the ‘pedagogical issue’ (i.e. ‘how good is double crux at the second order task of getting people better at resolving disagreement/getting closer to the truth’). I now better understand you take the main support from double crux to draw upon the latter issue, but I’d also like to press on some topics about the former on which I believe we disagree.
How well does double crux perform?
Your first two diagrams precisely capture the distinction I have in mind (I regret not having thought to draw my own earlier). If I read the surrounding text right (I’m afraid not to know what ‘fompmott’ means, and google didn’t help me), you suggest that even if better cognisers find their considerations form a denser web like the second diagram, double-crux amenable ‘sparser’ webs are still common in practice, perhaps due to various non-rational considerations. You also add:
I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first [I think second? - T] image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
This note mirrors a further thought I had (c.f. Ozymandias’s helpful remark in a child about sequence versus cluster thinking). Yet I fear this poses a further worry for the ‘performance issue’ of double crux, as it implies that the existence of cruxes (or double cruxes) may be indicative of pathological epistemic practices. A crux implies something like the following:
You hold some belief B you find important (at least, important enough you think it is worth your time to discuss).
Your credence in B depends closely on some consideration C.
Your credence in C is non-resilient (at least sufficiently non-resilient you would not be surprised to change your mind on it after some not-unduly-long discussion with a reasonable interlocutor).*
* What about cases where one has a resilient credence in C? Then the subsequent worries do not apply. However, I suspect these cases often correspond to “we tried to double crux and we found we couldn’t make progress on resolving our disagreement about theories of truth/normative ethics/some other foundational issue”.
It roughly follows from this you should have low resilience in your credence of B. As you note, this is vulnerable, and knowing one had non-resilient credences in important Bs is to be avoided.
As a tool of diagnosis, double crux might be handy (i.e. “This seems to be a crux for me, yet cruxes aren’t common among elite cognisers—I should probably go check whether they agree this is the crux of this particular matter, and if not maybe see what else they think bears upon B besides C”). Yet (at least per the original exposition) it seems to be more a tool for subsequent ‘treatment’. Doing so could make things worse, not better.
If X and Y find they differ on some crux, but also understand that superior cognisers tend not to have this crux, and distribute support across a variety of considerations, it seems a better idea for them to explore other candidate considerations rather than trying to resolve their disagreement re. C. If they instead do the double-cruxy thing and try and converge on C, they may be led up the epistemic garden path. They may agree with one another on C (thus B), and thus increase their resilience of C (thus B), yet they also confirm a mistaken web of belief around B which wrongly accords too much weight to C. If (as I suggest) at least half the battle on having good ‘all things considered’ attitudes to recondite matters comprises getting the right weights for relevant considerations on the matter, double crux may celebrate them converging further away from the truth. (I take this idea to be expressed in kernel in Ozymandias’s worry of double crux displacing more-expectedly-accurate cluster thinking with less-expectedly-accurate sequence thinking).
How good is double crux at ‘levelling people up at rationality’
The substantial independence of the ‘performance issue’ from the pedagogical issue’
In the same way practising scales may not be the best music, but make one better at playing music, double crux may not be the best discussion technique, but make one better at discussions. This seems fairly independent of its ‘object level performance’ (although I guess if the worry above is on the right track, we would be very surprised if a technique that on the object level leads beliefs to track truth more poorly nonetheless has a salutatory second-order effect).
Thus comparisons to practices of elite philosophers (even if they differ) are inapposite—especially, as I understand from one of them, the sort of superior pattern I observe occurs only at a far right tail even among philosophers (i.e. ‘world-class’ as you write, rather then ‘good’, as I write in the OP). It is obviously a great boon if I could get some fraction more like someone like Askell or Shulman without either their profound ability or the time they have invested in these practices.
On demurring the ‘double crux challenge’
I regret I don’t think it would be hugely valuable to ‘try double crux’ with an instructor in terms of resolving this disagreement. One consideration (on which more later) is that conditional on me not being persuaded by a large group of people who self-report double crux is great, I shouldn’t change my mind (for symmetry reasons) if this number increases by one other person, or it increases by including me. Another is that the expected yield may not be great, at least in one direction: although I hope I am not ‘hostile’ to double crux, it seems one wouldn’t be surprised if it didn’t work with me, even if its generally laudable.
Yet I hope I am not quite as recalcitrant as ‘I would not believe until I felt the stigmata with my own hands’. Apart from a more publicly legible case (infra), I’m a bit surprised at the lack of ‘public successes’ of double cruxing (although this may confuse performance versus pedagogy). In addition to Constantin, Raemon points to their own example with gjm. Maybe I’m only seeing what I want to, but I get a similar impression. They exhibit a variety of laudable epistemic practices, but I don’t see a crux or double crux (what they call ‘cruxes’ seem to be more considerations they take to be important).
The methods of rational self-evaluation
You note a head-to-head comparison between double crux and an approximate sham-control seemed to favour double crux. This looks like interesting data, and it seems a pity it emerges in the depths of a comment thread (ditto the ‘large n of successes’) rather than being written up and presented—it seems unfortunate that the last ‘public evaluation report’ is about 2 years old. I would generally urge trying to produce more ‘public evidence’ rather than the more private “we’ve generally seen this work great (and a large fraction of our alums agree!)”
I recognise that “Provide more evidence to satisfy outside sceptics” should not be high on CFAR’s priority list. Yet I think it is instrumental to other important goals instead. Chiefly: “Does what we are doing actually work?”
You noted in your initial reply undercutting considerations to the ‘we have a large n of successes’, yet you framed this in way that these would often need to amount to a claim of epistemic malice (i.e. ‘either CFAR is lying or participants are being socially pressured into reporting a falsehood’). I don’t work at a rationality institute or specialise in rationality, but on reflection I find this somewhat astonishing. My impression of cognitive biases were that they were much more insidious, that falling prey to them was the rule rather than the exception, and that sincere good faith was not adequate protection (is this not, in some sense, what CFAR casus belli is predicated upon?)
Although covered en passant, let me explicitly (although non-exhaustively) list things which might bias more private evidence of the type CFAR often cites:
CFAR staff (collectively) are often responsible for developing the interventions they hope will improve rationality. One may expect them to be invested in them, and more eager to see that they work than see they don’t (c.f. why we prefer double-blinding over single-blinding).
Other goods CFAR enjoys (i.e. revenue/funding, social capital) seem to go up the better the results of their training. Thus CFAR staff have a variety of incentives pushing them to over-report how good their ‘product’ is (c.f. why conflicts of interest are bad, the general worries about pharma-funded drug trials).
Many CFAR participants have to spend quite a lot of money (i.e. fees and travel) to attend a workshop. They may fear looking silly if it turns out after all this it didn’t do anything, and so incentivised to assert it was much more helpful than it actually was (c.f. choice supportive bias).
There are other aspects of CFAR workshops that participants may enjoy independent of the hoped-for improvement of their rationality (e.g. hanging around interesting people like them, personable and entertaining instructors, romantic entanglements). This extraneous benefits may nonetheless bias upwards their estimate of how effective CFAR workshops are at improving their rationality (c.f. halo effect).
I am sure there are quite a few more. One need not look that hard to find lots of promising studies supporting a given intervention undermined by any one of these.
The reference class of interventions with “a large corpus of (mainly self-reported) evidence of benefit, but susceptible to these limitations” is dismal. It includes many branches of complementary medicine. It includes social programs (e.g. ‘scared straight’) that we now know to be extremely harmful. It includes a large number of ineffective global poverty interventions. Beyond cautionary tales, I aver these approximate the modal member of the class: when the data is so subjective, and the limitations this severe, one should expect the thing in question doesn’t actually work after all.
I don’t think this expectation changes when we condition on the further rider “And the practicioners really only care about the truth re. whether the intervention works or not.” What I worry about going on under the hood is a stronger (and by my lights poorly substantiated) claim of rationalist exceptionalism: “Sure, although cognitive biases plague entire fields of science and can upend decades of results, and we’re appropriately quick to point out risk of bias of work done by outsiders, we can be confident that as we call ourselves rationalists/we teach rationality/we read the sequences/etc. we are akin to Penelope refusing her army of suitors—essentially incorruptible. So when we do similarly bias-susceptible sorts of things, we should give one another a pass.”
I accept ‘gold standard RCTs’ are infeasible (very pricey, and how well can one really do ‘sham CFAR’?) yet I aver there is quite a large gap between this ideal of evidence and the actuality (i.e. evidence kept in house, and which emerges via reference in response to criticism) which could be bridged by doing more write-ups, looking for harder metrics that put one more reliably in touch with reality, and so on. I find it surprisingly incongruent that the sort of common cautions about cognitive biases—indeed, common cautions that seem predicates for CFAR’s value proposition (e.g. “Good faith is not enough”, “Knowing about the existence of biases does not make one immune to them”, Feynmann’s dictum about ‘you are the easiest person to fool’), are not reflected in its approach to self-evaluation.
If nothing else, opening up more of CFAR’s rationale, evidence, etc. to outside review may allow more benefits of outside critique. Insofar as it is the case you found this exchange valuable, one may anticipate greater benefit from further interaction with higher-quality sceptics.
First off, a symmetric apology for any inflammatory or triggering nature in my own response, and an unqualified acceptance of your own, and reiterated thanks for writing the post in the first place, and thanks for engaging further. I did not at any point feel personally attacked or slighted; to the degree that I was and am defensive, it was over a fear that real value would be thrown out or socially disfavored for insufficient reason.
(I note the symmetrical concern on your part: that real input value will be thrown out or lost by being poured into a socially-favored-for-insufficient-reason framework, when other frameworks would do better. You are clearly motivated by the Good.)
You’re absolutely right that the relative lack of double cruxes ought be on my list of cruxes. It is in fact, and I simply didn’t think of it to write it down. I highly value double crux as a technique if double cruxes are actually findable in 40-70% of disagreements; I significantly-but-not-highly value double crux if double cruxes are actually findable in 25-40% of disagreements; I lean toward ceasing to investigate double crux if they’re only findable in 10-25%, and I am confused if they’re rarer than 10%.
I agree that this is a relevant place to investigate, and at the risk of proving you right at the start, I add it to my list of things which would cause me to shift my belief somewhat.
The claim that I derive from “there’s surprisingly often one crux” is something like the following: that, for most people, most of the time, there is not in fact a careful, conscious, reasoned weighing and synthesis of a variety of pieces of evidence. That, fompmott, the switch from “I don’t believe this” to “I now believe this” is sudden rather than gradual, and, post-switch, involves a lot of recasting of prior evidence and conclusions, and a lot of further confirmation-biased integration of new evidence. That, fompmott, there are a lot of accumulated post-hoc justifications whose functional irrelevance may not even be consciously acknowledged, or even safe to acknowledge, but whose accumulation is strongly incentivized given a culture wherein a list of twenty reasons is accorded more than 20x the weight of a list of one reason, even if nineteen of those twenty reasons are demonstrated to be fake (e.g. someone accused of sexual assault, acquitted due to their ironclad alibi that they were elsewhere, and yet the accusation still lingers because of all the sticky circumstantial bits that are utterly irrelevant).
In short, the idealized claim of double crux is that people’s belief webs look like this:
<insert>
Whereas I read you claiming that people’s belief webs look like this:
<insert>
And on reflection and in my experience, the missing case that tilts toward “double crux is surprisingly useful” is that a lot of belief webs look like this:
<insert>
… where they are not, in fact, simplistic and absolutely straightforward, but there often is a crux which far outweighs all of the other accumulated evidence.
I note that, if correct, this theory would indicate that e.g. your average LessWronger would find less value in double crux than your average CFAR participant (who shares a lot in common with a LessWronger but in expectation is less rigorous and careful about their epistemics). This being because LessWrongers try very deliberately to form belief webs like the first image, and when they have a belief web like the third image they try to make that belief feel to themselves as unbalanced and vulnerable as it actually is. Ergo, LessWrongers would find the “Surprise! You had an unjustified belief!” thing happening less often and less unexpectedly.
If I’m reading you right, this takes care of your first bullet point above entirely and brings us closer to a mutual understanding on your second bullet point. Your third bullet point remains entirely unaddressed in double crux except by the fact that we often have common cultural pressures causing us to have aligned-or-opposite opinions on many matters, and thus in practice there’s often overlap. Your fourth bullet point seems both true and a meaningful hole or flaw in double crux in its idealized, Platonic form, but also is an objection that in practice is rather gracefully integrated by advice to “keep ideals in mind, but do what seems sane and useful in the moment.”
To the extent that those sections of your arguments which miss were based on my bad explanation, that’s entirely on me, and I apologize for the confusion and the correspondingly wasted time (on stuff that proved to be non-crucial!). I should further clarify that the double crux writeup was conceived in the first place as “well, we have a thing that works pretty well when transmitted in person, but people keep wanting it not transmitted in person, partly because workshops are hard to get to even though we give the average EA or rationalist who can’t afford it pretty significant discounts, so let’s publish something even though it’s Not Likely To Be Good, and let’s do our best to signal within the document that it’s incomplete and that they should be counting it as ‘better than nothing’ rather than judging it as ‘this is the technique, and if I’m smart and good and can’t do it from reading, then that’s strong evidence that the technique doesn’t work for me.’” I obviously did not do enough of that signaling, since we’re here.
Re: the claim “Double crux is not essential for these incorporated practices.” I agree wholeheartedly on the surface—certainly people were doing good debate and collaborative truthseeking for millennia before the double crux technique was dreamed up.
I would be interested in seeing a side-by-side test of double crux versus direct instruction in a set of epistemic debate principles, or double crux versus some other technique that purports to install the same virtues. We’ve done some informal testing of this within CFAR—in one workshop, Eli Tyre and Lauren Lee taught half the group double crux as it had always previously been taught, while I discussed with the other half all of the ways that truthseeking conversations go awry, and all of the general desiderata for a positive, forward-moving experience. As it turned out, the formal double crux group did noticeably better when later trying to actually resolve intellectual disagreement, but the strongest takeaway we got from it was that the latter group didn’t have an imperative to operationalize their disagreement into concrete observations or specific predictions, which seems like a non-central confound to the original question.
As for “I guess I have higher hopes for transparency and communicability of ‘good techniques’,” all I can do is fall back yet again on the fact that, every time skepticism of double crux has reared its head, multiple CFAR instructors and mentors and comparably skilled alumni have expressed willingness to engage with skeptics, and produce publicly accessible records and so forth. Perhaps, since CFAR’s the one claiming it’s a solid technique, 100% of the burden of creating such referenceable content falls on us, but one would hope that the relationship between enthusiasts and doubters is not completely antagonistic, and that we could find some Robin Hansons to our Yudkowskys, who are willing to step up and put their skepticism on the line as we are with our confidence.
As of yet, not a single person has sent me a request of the form “Okay, Duncan, I want to double crux with you about X such that we can write it down or video it for others to reference,” nor has anyone sent me a request of the form “Okay, Duncan, I suspect I can either prove double crux unworth it or prove [replacement Y] a more promising target. Let’s do this in public?”
I really really do want all of us to have the best tool. My enthusiasm for double crux has nothing to do with an implication that it’s perfect, and everything to do with a lack of visibly better options. If that’s just because I haven’t noticed something obvious, I’d genuinely appreciate having the obvious pointed out, in this case.
Thanks again, Thrasymachus.
Thank you for your gracious reply. I interpret a couple of overarching themes in which I would like to frame my own: the first is the ‘performance issue’ (i.e. ‘How good is double crux at resolving disagreement/getting closer to the truth’); the second the ‘pedagogical issue’ (i.e. ‘how good is double crux at the second order task of getting people better at resolving disagreement/getting closer to the truth’). I now better understand you take the main support from double crux to draw upon the latter issue, but I’d also like to press on some topics about the former on which I believe we disagree.
How well does double crux perform?
Your first two diagrams precisely capture the distinction I have in mind (I regret not having thought to draw my own earlier). If I read the surrounding text right (I’m afraid not to know what ‘fompmott’ means, and google didn’t help me), you suggest that even if better cognisers find their considerations form a denser web like the second diagram, double-crux amenable ‘sparser’ webs are still common in practice, perhaps due to various non-rational considerations. You also add:
This note mirrors a further thought I had (c.f. Ozymandias’s helpful remark in a child about sequence versus cluster thinking). Yet I fear this poses a further worry for the ‘performance issue’ of double crux, as it implies that the existence of cruxes (or double cruxes) may be indicative of pathological epistemic practices. A crux implies something like the following:
You hold some belief B you find important (at least, important enough you think it is worth your time to discuss).
Your credence in B depends closely on some consideration C.
Your credence in C is non-resilient (at least sufficiently non-resilient you would not be surprised to change your mind on it after some not-unduly-long discussion with a reasonable interlocutor).*
* What about cases where one has a resilient credence in C? Then the subsequent worries do not apply. However, I suspect these cases often correspond to “we tried to double crux and we found we couldn’t make progress on resolving our disagreement about theories of truth/normative ethics/some other foundational issue”.
It roughly follows from this you should have low resilience in your credence of B. As you note, this is vulnerable, and knowing one had non-resilient credences in important Bs is to be avoided.
As a tool of diagnosis, double crux might be handy (i.e. “This seems to be a crux for me, yet cruxes aren’t common among elite cognisers—I should probably go check whether they agree this is the crux of this particular matter, and if not maybe see what else they think bears upon B besides C”). Yet (at least per the original exposition) it seems to be more a tool for subsequent ‘treatment’. Doing so could make things worse, not better.
If X and Y find they differ on some crux, but also understand that superior cognisers tend not to have this crux, and distribute support across a variety of considerations, it seems a better idea for them to explore other candidate considerations rather than trying to resolve their disagreement re. C. If they instead do the double-cruxy thing and try and converge on C, they may be led up the epistemic garden path. They may agree with one another on C (thus B), and thus increase their resilience of C (thus B), yet they also confirm a mistaken web of belief around B which wrongly accords too much weight to C. If (as I suggest) at least half the battle on having good ‘all things considered’ attitudes to recondite matters comprises getting the right weights for relevant considerations on the matter, double crux may celebrate them converging further away from the truth. (I take this idea to be expressed in kernel in Ozymandias’s worry of double crux displacing more-expectedly-accurate cluster thinking with less-expectedly-accurate sequence thinking).
How good is double crux at ‘levelling people up at rationality’
The substantial independence of the ‘performance issue’ from the pedagogical issue’
In the same way practising scales may not be the best music, but make one better at playing music, double crux may not be the best discussion technique, but make one better at discussions. This seems fairly independent of its ‘object level performance’ (although I guess if the worry above is on the right track, we would be very surprised if a technique that on the object level leads beliefs to track truth more poorly nonetheless has a salutatory second-order effect).
Thus comparisons to practices of elite philosophers (even if they differ) are inapposite—especially, as I understand from one of them, the sort of superior pattern I observe occurs only at a far right tail even among philosophers (i.e. ‘world-class’ as you write, rather then ‘good’, as I write in the OP). It is obviously a great boon if I could get some fraction more like someone like Askell or Shulman without either their profound ability or the time they have invested in these practices.
On demurring the ‘double crux challenge’
I regret I don’t think it would be hugely valuable to ‘try double crux’ with an instructor in terms of resolving this disagreement. One consideration (on which more later) is that conditional on me not being persuaded by a large group of people who self-report double crux is great, I shouldn’t change my mind (for symmetry reasons) if this number increases by one other person, or it increases by including me. Another is that the expected yield may not be great, at least in one direction: although I hope I am not ‘hostile’ to double crux, it seems one wouldn’t be surprised if it didn’t work with me, even if its generally laudable.
Yet I hope I am not quite as recalcitrant as ‘I would not believe until I felt the stigmata with my own hands’. Apart from a more publicly legible case (infra), I’m a bit surprised at the lack of ‘public successes’ of double cruxing (although this may confuse performance versus pedagogy). In addition to Constantin, Raemon points to their own example with gjm. Maybe I’m only seeing what I want to, but I get a similar impression. They exhibit a variety of laudable epistemic practices, but I don’t see a crux or double crux (what they call ‘cruxes’ seem to be more considerations they take to be important).
The methods of rational self-evaluation
You note a head-to-head comparison between double crux and an approximate sham-control seemed to favour double crux. This looks like interesting data, and it seems a pity it emerges in the depths of a comment thread (ditto the ‘large n of successes’) rather than being written up and presented—it seems unfortunate that the last ‘public evaluation report’ is about 2 years old. I would generally urge trying to produce more ‘public evidence’ rather than the more private “we’ve generally seen this work great (and a large fraction of our alums agree!)”
I recognise that “Provide more evidence to satisfy outside sceptics” should not be high on CFAR’s priority list. Yet I think it is instrumental to other important goals instead. Chiefly: “Does what we are doing actually work?”
You noted in your initial reply undercutting considerations to the ‘we have a large n of successes’, yet you framed this in way that these would often need to amount to a claim of epistemic malice (i.e. ‘either CFAR is lying or participants are being socially pressured into reporting a falsehood’). I don’t work at a rationality institute or specialise in rationality, but on reflection I find this somewhat astonishing. My impression of cognitive biases were that they were much more insidious, that falling prey to them was the rule rather than the exception, and that sincere good faith was not adequate protection (is this not, in some sense, what CFAR casus belli is predicated upon?)
Although covered en passant, let me explicitly (although non-exhaustively) list things which might bias more private evidence of the type CFAR often cites:
CFAR staff (collectively) are often responsible for developing the interventions they hope will improve rationality. One may expect them to be invested in them, and more eager to see that they work than see they don’t (c.f. why we prefer double-blinding over single-blinding).
Other goods CFAR enjoys (i.e. revenue/funding, social capital) seem to go up the better the results of their training. Thus CFAR staff have a variety of incentives pushing them to over-report how good their ‘product’ is (c.f. why conflicts of interest are bad, the general worries about pharma-funded drug trials).
Many CFAR participants have to spend quite a lot of money (i.e. fees and travel) to attend a workshop. They may fear looking silly if it turns out after all this it didn’t do anything, and so incentivised to assert it was much more helpful than it actually was (c.f. choice supportive bias).
There are other aspects of CFAR workshops that participants may enjoy independent of the hoped-for improvement of their rationality (e.g. hanging around interesting people like them, personable and entertaining instructors, romantic entanglements). This extraneous benefits may nonetheless bias upwards their estimate of how effective CFAR workshops are at improving their rationality (c.f. halo effect).
I am sure there are quite a few more. One need not look that hard to find lots of promising studies supporting a given intervention undermined by any one of these.
The reference class of interventions with “a large corpus of (mainly self-reported) evidence of benefit, but susceptible to these limitations” is dismal. It includes many branches of complementary medicine. It includes social programs (e.g. ‘scared straight’) that we now know to be extremely harmful. It includes a large number of ineffective global poverty interventions. Beyond cautionary tales, I aver these approximate the modal member of the class: when the data is so subjective, and the limitations this severe, one should expect the thing in question doesn’t actually work after all.
I don’t think this expectation changes when we condition on the further rider “And the practicioners really only care about the truth re. whether the intervention works or not.” What I worry about going on under the hood is a stronger (and by my lights poorly substantiated) claim of rationalist exceptionalism: “Sure, although cognitive biases plague entire fields of science and can upend decades of results, and we’re appropriately quick to point out risk of bias of work done by outsiders, we can be confident that as we call ourselves rationalists/we teach rationality/we read the sequences/etc. we are akin to Penelope refusing her army of suitors—essentially incorruptible. So when we do similarly bias-susceptible sorts of things, we should give one another a pass.”
I accept ‘gold standard RCTs’ are infeasible (very pricey, and how well can one really do ‘sham CFAR’?) yet I aver there is quite a large gap between this ideal of evidence and the actuality (i.e. evidence kept in house, and which emerges via reference in response to criticism) which could be bridged by doing more write-ups, looking for harder metrics that put one more reliably in touch with reality, and so on. I find it surprisingly incongruent that the sort of common cautions about cognitive biases—indeed, common cautions that seem predicates for CFAR’s value proposition (e.g. “Good faith is not enough”, “Knowing about the existence of biases does not make one immune to them”, Feynmann’s dictum about ‘you are the easiest person to fool’), are not reflected in its approach to self-evaluation.
If nothing else, opening up more of CFAR’s rationale, evidence, etc. to outside review may allow more benefits of outside critique. Insofar as it is the case you found this exchange valuable, one may anticipate greater benefit from further interaction with higher-quality sceptics.