tailcalled comments on A Hill of Validity in Defense of Meaning

tailcalled 17 Jul 2023 9:10 UTC
2 points
0

These things are bad, but, apart from point 2, I would ask: how do they compare to the average quality of social science research? Do you have high standards, or do you just have high standards for one group? I think most of us spend at least some time in environments where the incentive gradients point towards the latter. Beware isolated demands for rigor.

I don’t know for sure as I am only familiar with certain subsets of social science, but a lot of it is in fact bad. I also often criticize normal social science, but in this context it was this specific area of social science that came up.

As for point 2—if you were a researcher with heretical opinions, determined to publish research on at least some of them, what would you do? It seems like a reasonable strategy is to pick something heretical that you’re confident you can defend, and do a rock-solid study on it, and brace for impact. Is it still the case that disproving the blank-slate hypothesis would constitute progress in some academic subfields? If so, then expect people to continue trying it.

I would try to perform studies that yield much more detailed information. For instance, mixed qualitative and quantitative studies where one qualitatively inspects the data points that are above-average or below-average for the regressions, to see whether there are identifiable missing factors.

So, at least as a “We suspected these results were bogus, tried to reproduce them, and got a much smaller effect size”, this seems all in order.

If he had phrased his results purely as disproving the importance of incentives, rather than effort, I think it would have been fine.

Your analysis essentially proposes that, if there were some method of increasing effort by 3-4x as much as he managed to increase it, then maybe you could in fact increase IQ scores by 10 points. This assumes that the effort-to-performance causation would stay constant as you step outside the tested range. That’s possible, but… I’m quite confident there’s a limit to how much “effort” can increase your results on a timed multiple-choice test, that you’ll hit diminishing marginal returns at some point (probably even negative marginal returns, if the incentive is strong enough to make many test-takers nervous), and extrapolating 3-4x outside the achieved effect seems dubious. (I also note that the 1x effect here means increasing your self-evaluated effort from 4.13 to 4.28 on a scale that goes up to 5, so a 4x effect would mean going to 4.73, approaching the limits of the scale itself.)

I prefer to think of it as “if you increase your effort from being one of the lowest-effort people to being one of the highest-effort people, you can increase your IQ score by 17 IQ points”. This doesn’t seem too implausible to me, though admittedly I’m not 100% sure what the lowest-effort people are doing.

It’s valid to say that extrapolating outside of the tested range is dubious, but IMO this means that the study design is bad.

I think it’s likely that the limited returns to effort would be reflected in the limited bounds of the scale. So I don’t think my position is in tension with the intuition that there’s limits on what effort can do for you. Under this model, it is also worth noting that the effort scores were negatively skewed, so this implies that lack of effort is a bigger cause of low scores than extraordinary effort is of high scores.

That is interesting… Though the correlation between test effort and test performance in the studies is given as 0.27 and 0.29 in different samples, so, noise notwithstanding, your effects are consistently larger by a decent margin. That would suggest that there’s something else going on than the simple causation.

I don’t think my results are statistically significantly different from 0.3ish; in the ensuing discussion, people pointed out that the IV results had huge error bounds (because the original study was only barely significant).

But also if there is measurement error in the instrument (effort), then that would induce an upwards bias in the IV estimated effect. So that might also contribute.

However, the “uses methods which are heavily downwards biased to “prove” [...]” is not. The “downwards biased methods” are “offering a monetary incentive of £2-£10, which turned out to be insufficient to change effort much”. The authors were doing a replication of Duckworth, in which most of the cited studies had a monetary incentive of <$10—so that part is correctly matched—and they used high enough N that Duckworth’s claimed effect size should have shown up easily. They also preregistered the first of their incentive-based studies (with the £2 incentive), and the later ones were the same but with increased sample size, then increased incentive. In other words, they did exactly what they should have done in a replication. To claim that they chose downwards-biased methods for the purpose of proving their point seems quite unfair; those methods were chosen by Duckworth.

Shitty replications of shitty environmentalist research is still shitty.

Like this sort of thing makes sense to do as a personal dispute between the researchers, but for all of us who’d hope to actually use or build on the research for substantial purposes, it’s no good if the researchers use shitty methods because they are trying to build a counternarrative against other researchers using shitty methods.

Let’s see… It seems uncontroversial (among the participants in this discussion) that there are dimensions on which male and female brains differ (on average), and on which autists are (on average) skewed towards the male side, and that this includes the empathizing and systematizing dimensions.

I wouldn’t confidently disagree with this, but I do have some philosophical nitpicks/uncertainties.

(“Brain” connotes neurology to me, yet I am not sure if empathizing and especially systematizing are meaningful variables on a neurological level. I would also need to double-check whether EQ/SQ are MI for sex and autism because I don’t remember whether they are. I suspect in particular the EQ is not, and it is the biggest drive of the EQ/SQ-autism connection, so it is pretty important to consider. But for the purposes of the Motte-Bailey situation, we can ignore that. Just tagging it as a potential area of disagreement.)

Would it be better if he used a word other than “theory”? “Model”? You somewhat facetiously propose “If the EMB theory had instead been named the “sometimes autistic people are kinda nerdy” theory, then it would be a lot more justified by the evidence”. How about, say, the theory that “There are processes that masculinize the brain in males; and some of those processes going into overdrive is a thing that causes autism”? (Which was part of the original paper: “What causes this shift remains unclear, but candidate factors include both genetic differences and prenatal testosterone.”)

I think what would be better would be if he clarified his models and reasoning. (Not positions, as that opens up the whole Motte-Bailey thing and also is kind of hard to engage with.) What is up with the original claim about autists always being extreme type S? Was this just a mistake that he would like to retract? If he only considers it to be a contributor that leads to half the variance, does he have any opinion on the nature of the other contributors to autism? Does he have any position on the relationship between autistic traits as measured by the AQ, and autism diagnosis? What should we make of the genetic contributors to autism being basically unrelated to the EQ/SQ? (And if the EQ/SQ are not MI for sex/autism, what does he make of that?)

Do you have examples of Baron-Cohen making claims of that kind, which aren’t explainable as him taking the “This theory is not exactly correct, but it makes useful predictions” approach?

This is part of the trouble, these areas do not have proper discussions.

It seems you’re saying Damore mentions A but not B, and B is bigger, therefore Damore’s “comprehensive” writeup is not so, and this omission is possibly ill-motivated.

...

This suggests that casting aspersions on Damore’s motives is not gated by “Maybe I should double-check what he said to see if this is unfair”.

No, I meant that under your interpretation, Damore mentions A when A is of negligible effect, and so that indicates a mistake. I didn’t mean to imply that he didn’t mention B, and I read this part of his memo multiple times prior to sending my original comment, so I was fully aware that he mentioned B.

Well, he lists one source of stress above, and he does recommend to “Make tech and leadership less stressful”.

But again the “Make tech and leadership less stressful” point boiled down to medicalizing it.

And why would these rationalists care so much about avoiding these conflicts, to the point of compromising the intellectual integrity that seems so dear to them? Fear that they’d face the kind of hostility and career-ruining accusations directed at Damore, and things downstream of fears like that, seems like a top candidate explanation.

Valid point.

Um. Accusations are things you make about individuals, occasionally organizations. I hope that the majority of differential psychology papers don’t consist of “Bob Jones has done XYZ bad thing”.

Differential psychology papers tend to propose ways to measure traits that they consider important, to extend preciously created measures with new claims of importance, and to rank demographics by importance.

You are equivocating between reckless claims of misconduct / malice by an individual, and heavily cited claims about population-level averages that are meant to inform company policy. Are you seriously stating an ethical principle that anyone who makes the latter should expect to face the former and it’s justified?

I think in an ideal world, the research and the discourse would be more rational. For people who are willing to discuss and think about these matters rationally, it seems inappropriate to accuse them of misconduct/malice simply for agreeing with them. However if people have spent a long time trying to bring up rational discussion and failed, then it is reasonable for these people to assume misconduct/malice.

I think Damore was aware that there are people who use population-level differences to justify discriminating against individuals, and that’s why he took pains to disavow that.

Using population-level differences to justify discriminating against individuals can be fine and is not what I have been objecting to.

As for “the problems with various differential psychology findings”—do you think that some substantial fraction, say at least 20%, of the findings he cited were false?

I don’t know. My problem with this sort of research typically isn’t that it is wrong (though it sometimes may be) but instead that it is of limited informative value.

I should probably do a top-level review post where I dig through all his cites to look at which parts of his memo are unjustified and which parts are wrong. I’ll tag you if I do that.