I agree with the gist of the critique of double crux as presented here, and have had similar worries. I don’t endorse everything in this comment, but think taking it seriously will positively contribute to developing an art of productive disagreement.
I think the bet at the end feels a bit fake to me, since I think it is currently reasonable to assume that publishing a study in a prestigious psychology journal is associated with something around 300 person hours of completely useless bureaucratic labor, and I don’t think it is currently worth it for CFAR to go through that effort (and neither I think is it for almost anyone else). However, if we relax the constraints to only reaching the data quality necessary to publish in a journal (verified by Carl Shulman or Paul Christiano or Holden Karnofsky, or whoever we can find who we would both trust to assess this), I am happy to take you up on your 4-to-1 bet (as long as we are measuring the effect of the current set of CFAR instructors teaching, not some external party trying to teach the same techniques, which I expect to fail).
I sadly currently don’t have the time to write a larger response about the parts of your comment I disagree with, but think this is an important enough topic that I might end up writing a synthesis on things in this general vicinity, drawing from both yours and other people’s writing. For now, I will leave a quick bullet list of things I think this response/argument is getting wrong:
While your critique is pointing out true faults in the protocol of double crux, I think it has not yet really engaged with some of the core benefits I think it brings. You somewhat responded to this by saying that you think other people don’t agree what double crux is about, which is indeed evidence of the lack of a coherent benefit, however I claim that if you would dig deeper into those people’s opinion, you will find that the core benefits they claim might sound superficially very different, but are actually at the core quite similar and highly related. I personally expect that we two would have a more productive disagreement than you and Duncan, and so I am happy to chat in person, here on LW or via any chat service of your preference if you want to dig deeper into this. Though obviously feel completely free to decline this.
I particularly think that the alternative communication protocols you proposed are significantly worse and, in as much as they are codified, do not actually result in more productive disagreement.
I have a sense that part of your argument still boils down to “CFAR’s arguments are not affiliated enough with institutions that are allowed to make a claim about something like this (whereas academic philosophers and psychology journals are).”. This is a very tentative impression, and I do not want to give you the sense that you have to defend yourself for this. I have high priors on people’s epistemics being tightly entangled with their sense of status, and usually require fairly extraordinary evidence until I am convinced that this is not the case for any specific individual. However, since this kind of psychologizing almost never results in a productive conversation, this is not a valid argument in a public debate. And other people should be very hesitant to see my position as additional evidence of anything. But I want to be transparent in my epistemic state, and state the true reasons for my assessment as much as possible.
While I agree that the flaws you point out are indeed holding back the effectiveness of double crux, I disagree that they have any significant negative effects on your long-term epistemics. I don’t think CFAR is training people to adopt worse belief structures after double cruxing, partially because I think the default incentives on people’s belief structures as a result of normal social conversation are already very bad and doing worse by accident is unlikely, and because the times I’ve seen double crux in practice, I did not see mental motions that would correspond to the loss of the web-like structure of their beliefs, and more noticed a small effect in the opposite direction (i.e. people’s social stance towards something was very monolithic and non-web-like, but as soon as they started double cruxing their stated beliefs were much more web-like).
Overall, I am very happy about this comment, and would give it a karma reward had I not spent time writing this comment, and implemented the moderator-karma-reward functionality instead. While I would have not phrased the issues I have with double crux in the same language, the issues it points out overlap to a reasonable degree with the ones that I have, and so I think it also represents a good chunk of my worries. Thank you for writing it.
Thanks for your reply. Given my own time constraints I’ll decline your kind offer to discuss this further (I would be interested in reading some future synthesis). As consolation, I’d happily take you up on the modified bet. Something like:
Within the next 24 months CFAR will not produce results of sufficient quality for academic publication (as judged by someone like Christiano or Karnofsky) that demonstrate benefit on a pre-specified objective outcome measure
I guess ‘demonstrate benefit’ could be stipulated as ‘p<0.05 on some appropriate statistical test’ (the pre-specification should get rid of the p-hacking worries). ‘Objective’ may remain a bit fuzzy: the rider is meant to rule out self-report stuff like “Participants really enjoyed the session/thought it helped them”. I’d be happy to take things like “Participants got richer than controls”, “CFAR alums did better on these previously used metrics of decision making”, or whatever else.
Happy to discuss further to arrive at agreeable stipulations—or, if you prefer, we can just leave them to the judges discretion.
(4 to 1): Conditional on a CFAR study getting past peer review, it will not show significantly positive effects on any objective, pre-specified outcome measure.
I don’t know CFAR’s current plans well enough to judge whether they will synthesize the relevant evidence. I am only betting that if they do, the result will be positive. I am still on the fence of taking a 4-1 bet on this, but the vast majority of my uncertainty here comes from what CFAR is planning to do, not what the result would be. I would probably take a 5-1 bet on the statement as you proposed it.
Sorry for misreading your original remark. Happy to offer the bet in conditional, i.e.:
Conditional on CFAR producing results of sufficient quality for academic publication (as judged by someone like Christiano or Karnofsky) these will fail to demonstrate benefit on a pre-specified objective outcome measure
I agree with the gist of the critique of double crux as presented here, and have had similar worries. I don’t endorse everything in this comment, but think taking it seriously will positively contribute to developing an art of productive disagreement.
I think the bet at the end feels a bit fake to me, since I think it is currently reasonable to assume that publishing a study in a prestigious psychology journal is associated with something around 300 person hours of completely useless bureaucratic labor, and I don’t think it is currently worth it for CFAR to go through that effort (and neither I think is it for almost anyone else). However, if we relax the constraints to only reaching the data quality necessary to publish in a journal (verified by Carl Shulman or Paul Christiano or Holden Karnofsky, or whoever we can find who we would both trust to assess this), I am happy to take you up on your 4-to-1 bet (as long as we are measuring the effect of the current set of CFAR instructors teaching, not some external party trying to teach the same techniques, which I expect to fail).
I sadly currently don’t have the time to write a larger response about the parts of your comment I disagree with, but think this is an important enough topic that I might end up writing a synthesis on things in this general vicinity, drawing from both yours and other people’s writing. For now, I will leave a quick bullet list of things I think this response/argument is getting wrong:
While your critique is pointing out true faults in the protocol of double crux, I think it has not yet really engaged with some of the core benefits I think it brings. You somewhat responded to this by saying that you think other people don’t agree what double crux is about, which is indeed evidence of the lack of a coherent benefit, however I claim that if you would dig deeper into those people’s opinion, you will find that the core benefits they claim might sound superficially very different, but are actually at the core quite similar and highly related. I personally expect that we two would have a more productive disagreement than you and Duncan, and so I am happy to chat in person, here on LW or via any chat service of your preference if you want to dig deeper into this. Though obviously feel completely free to decline this.
I particularly think that the alternative communication protocols you proposed are significantly worse and, in as much as they are codified, do not actually result in more productive disagreement.
I have a sense that part of your argument still boils down to “CFAR’s arguments are not affiliated enough with institutions that are allowed to make a claim about something like this (whereas academic philosophers and psychology journals are).”. This is a very tentative impression, and I do not want to give you the sense that you have to defend yourself for this. I have high priors on people’s epistemics being tightly entangled with their sense of status, and usually require fairly extraordinary evidence until I am convinced that this is not the case for any specific individual. However, since this kind of psychologizing almost never results in a productive conversation, this is not a valid argument in a public debate. And other people should be very hesitant to see my position as additional evidence of anything. But I want to be transparent in my epistemic state, and state the true reasons for my assessment as much as possible.
While I agree that the flaws you point out are indeed holding back the effectiveness of double crux, I disagree that they have any significant negative effects on your long-term epistemics. I don’t think CFAR is training people to adopt worse belief structures after double cruxing, partially because I think the default incentives on people’s belief structures as a result of normal social conversation are already very bad and doing worse by accident is unlikely, and because the times I’ve seen double crux in practice, I did not see mental motions that would correspond to the loss of the web-like structure of their beliefs, and more noticed a small effect in the opposite direction (i.e. people’s social stance towards something was very monolithic and non-web-like, but as soon as they started double cruxing their stated beliefs were much more web-like).
Overall, I am very happy about this comment, and would give it a karma reward had I not spent time writing this comment, and implemented the moderator-karma-reward functionality instead. While I would have not phrased the issues I have with double crux in the same language, the issues it points out overlap to a reasonable degree with the ones that I have, and so I think it also represents a good chunk of my worries. Thank you for writing it.
Thanks for your reply. Given my own time constraints I’ll decline your kind offer to discuss this further (I would be interested in reading some future synthesis). As consolation, I’d happily take you up on the modified bet. Something like:
Within the next 24 months CFAR will not produce results of sufficient quality for academic publication (as judged by someone like Christiano or Karnofsky) that demonstrate benefit on a pre-specified objective outcome measure
I guess ‘demonstrate benefit’ could be stipulated as ‘p<0.05 on some appropriate statistical test’ (the pre-specification should get rid of the p-hacking worries). ‘Objective’ may remain a bit fuzzy: the rider is meant to rule out self-report stuff like “Participants really enjoyed the session/thought it helped them”. I’d be happy to take things like “Participants got richer than controls”, “CFAR alums did better on these previously used metrics of decision making”, or whatever else.
Happy to discuss further to arrive at agreeable stipulations—or, if you prefer, we can just leave them to the judges discretion.
Ah, the 4-1 to one bet was a conditional one:
I don’t know CFAR’s current plans well enough to judge whether they will synthesize the relevant evidence. I am only betting that if they do, the result will be positive. I am still on the fence of taking a 4-1 bet on this, but the vast majority of my uncertainty here comes from what CFAR is planning to do, not what the result would be. I would probably take a 5-1 bet on the statement as you proposed it.
Sorry for misreading your original remark. Happy to offer the bet in conditional, i.e.:
Conditional on CFAR producing results of sufficient quality for academic publication (as judged by someone like Christiano or Karnofsky) these will fail to demonstrate benefit on a pre-specified objective outcome measure