and that it’s actually immoral not to believe in psychological sex differences given that psychological sex differences are actually real
Perhaps the archetypal psychological sex difference that people have argued about is “women are emotional, men are rational”.
After reading your previous post where you quoted Deirdre McCloskey’s memoir, I started reading the memoir a bit too, and it actually provides a neat example of this psychological sex difference.
There was a period where McCloskey’s wife got all emotional and started complaining about McCloskey spending too much money on the phone bill. McCloskey very rationally pointed out that it was very cheap compared to e.g. therapy or hobbies. Clean example of female irrationality, right?
Of course, if one looks at the extended context, the picture is very different; Deirdre had initially assured the wife that it was just crossdressing and nothing else, and they had agreed with each other to put the crossdressing into the background, but now Deirdre was seriously considering transitioning, yet insisting that things like beard shaving was just crossdressing and nothing more. Essentially, from the beginning McCloskey took rational conversation off the table due to expecting that the wife would leave if they talked with each other about McCloskey’s desire to be a woman.
In retrospect, it seems McCloskey’s wife was right to worry. But also, it illustrates how psychological sex differences discourse tend to abstract over the environmental constraints people exist within. It seems like in these sorts of cases, “psychological sex differences” can function as a tool of oppression or abuse, rather than a genuine attempt at describing the world.
I think “psychological sex differences” ideology is energetically unable to acknowledge these sorts of things because its main purpose/motivation is to function as a counterstory against feminist ideology, and so the point is not to accurately describe the world but instead to deny cultural factors. (Let’s not forget James Damore’s memo, who cited research on greater female neuroticism as a justification for ignoring women’s issues with their workplace.)
One of the Tweets that had recently led to radical feminist Meghan Murphy getting kicked off the platform read simply, “Men aren’t women tho.” This doesn’t seem like a policy claim; rather, Murphy was using common language to express the fact-claim that members of the natural category of adult human males, are not, in fact, members of the natural category of adult human females.
I think her tweet was intended to make a claim about trans women specifically, and that it was this sub-claim that she was banned for. For example, if she had asked for a list of the richest women, and someone had linked Elon Musk, I don’t think she would have gotten banned for responding “Men aren’t women tho.”.
So I guess one could say Murphy was using common language to express the fact-claim that even if members of the natural category of adult human males transition or become intent on transitioning, they are not, in fact, members of the natural category of adult human females.
By “traits” I mean not just sex chromosomes (as Yudkowsky suggested on Twitter), but the conjunction of dozens or hundreds of measurements that are causally downstream of sex chromosomes: reproductive organs and muscle mass (again, sex difference effect size of Cohen’s d ≈ 2.6) and Big Five Agreeableness (d ≈ 0.5) and Big Five Neuroticism (d ≈ 0.4) and short-term memory (d ≈ 0.2, favoring women) and white-gray-matter ratios in the brain and probable socialization history and any number of other things—including differences we might not know about, but have prior reasons to suspect exist.
I think trans women are female-typical with respect to Agreeableness and Neuroticism, though I don’t know why. I think HRT changes large-scale brain proportions to be either intermediate between male and female or all the way to the female end, though I don’t think this has anything to do with Agreeableness/Neuroticism.
Of course since you didn’t transition and went off HRT, this might not apply to you.
But having done the reduction-to-cognitive-algorithms, it still looks like the person-in-the-street has a point that I shouldn’t be allowed to ignore just because I have 30 more IQ points and better philosophy-of-language skills?
Yes but the coalition that supports the person-in-the-street opposes me in my dispute with Phil because my dispute with Phil looks superficially similar, so I oppose the coalition that supports the person-in-the-street.
Of course, such speech restrictions aren’t necessarily “irrational”, depending on your goals. If you just don’t think “free speech” should go that far—if you want to suppress atheism or gender-critical feminism with an iron fist—speech codes are a perfectly fine way to do it! And to their credit, I think most theocrats and trans advocates are intellectually honest about what they’re doing: atheists or transphobes are bad people (the argument goes) and we want to make it harder for them to spread their lies or their hate.
After all, using Aumann-like reasoning, in a dispute of ‘me and Michael Vassar vs. everyone else’, wouldn’t I want to bet on ‘everyone else’?”
The intuition here is that Aumann-like reasoning implies something like averaging everyone’s opinions, and therefore if there are lots of people on one side, they would dominate in the averaging.
But actually, I think it is better to think of Aumann-like reasoning as adding together everyone’s opinions. More formally, if you imagine that everyone has observed different pieces of independent evidence, leading to different people having different updates of their opinions relative to the prior, then to get the Aumannian update you have to add up all the changes in log-odds.
Or alternatively, you are thinking of it as, “what is the probability that all of them have somehow become irrational/dishonest with respect to this subject?”. I think… it’s rare for irrationality/dishonesty to be one-sided? In the example disputes I can think of off the top of my head, it’s usually both or neither.
In theory one might think that there should be a negative correlation between the sides being dishonest due to collider bias; you only need one side to be dishonest in order to get a dispute, so it does seem kind of weird if dishonesty on both sides is correlated. But I think what happens is that disputes lead to conflicts, and conflicts lead to defensiveness, overreach and aggressiveness, and so a single dispute can spin out into an entire system of rationality falling apart. These disputes all arguably started generations ago.
So basically, my Aumann-like reasoning for persistent conflicts would go: The rationalist community is being irrational/dishonest about trans topics. There must be a reason for this; and indeed, it seems like a defense mechanism against the conservative/HBD coalition’s conflicts against trans people. But the conservative/HBD coalition is also often bad, irrational and dishonest! And you’ve been regularly trusting them and appealing to their findings in your posts, even though I think upon closer inspection there are lots of issues that you’d recognize. Ooops.
I said that saying, “I am worried that if I actually point out the physical injuries …” when the actual example turned out to be sex reassignment surgery seemed dishonest: I had thought he might have more examples of situations like mine or “Rebecca”’s, where gaslighting escalated into more tangible harm in a way that people wouldn’t know about by default. In contrast, people already know that bottom surgery is a thing; Ben just had reasons to think it’s Actually Bad—reasons that his friends couldn’t engage with if we didn’t know what he was talking about. It was bad enough that Yudkowsky was being so cagey; if everyone did it, then we were really doomed.
I guess this makes for a neat example of what I just mentioned.
Michael replied at 5:58 a.m., saying that everyone’s first priority should be making sure that I could sleep—that given that I was failing to adhere to my commitments to sleep almost immediately after making them, I should be interpreted as urgently needing help, and that Scott had comparative advantage in helping, given that my distress was most centrally over Scott gaslighting me, asking me to consider the possibility that I was wrong while visibly not considering the same possibility regarding himself.
That seemed a little harsh on Scott to me. At 6:14 a.m. and 6:21 a.m., I wrote a couple emails to everyone that my plan was to get a train back to my own apartment to sleep, that I was sorry for making such a fuss despite being incentivizable while emotionally distressed, that I should be punished in accordance with the moral law for sending too many hysterical emails because I thought I could get away with it, that I didn’t need Scott’s help, and that I thought Michael was being a little aggressive about that, but that I guessed that’s also kind of Michael’s style.
Michael was furious with me. (“What the FUCK Zack!?! Calling now,” he emailed me at 6:18 a.m.) I texted and talked with him on my train ride home. He seemed to have a theory that people who are behaving badly, as Scott was, will only change when they see a victim who is being harmed. Me escalating and then immediately deescalating just after Michael came to help was undermining the attempt to force an honest confrontation, such that we could get to the point of having a Society with morality or punishment.
Another strong example.
I mean, I can feel how Michael and Ben are hurting here. They are in open conflict with the rationalist leaders and are spending a lot of reputation on this.
But they are also opposing locally honest behavior in order to force the rationalist community to improve.
I wondered if maybe, in Scott or Eliezer’s mental universe, I was a blameworthy (or pitiably mentally ill) nitpicker for flipping out over a blog post from 2014 (!) and some Tweets (!!) from November. I, too, had probably said things that were wrong five years ago.
But I thought I had made a pretty convincing case that a lot of people were making a correctable and important rationality mistake, such that the cost of a correction (about the philosophy of language specifically, not any possible implications for gender politics) would be justified here. As Ben pointed out, if someone had put this much effort into pointing out an error I had made four months or five years ago and making careful arguments for why it was important to get the right answer, I probably would put some serious thought into it.
Yep. And I mean, once you’ve written up the explanation of what the error is like, it’s pretty cheap to correct.
Scott Alexander could cross out his “Categories” post and put a link to your response at the top, and write a brief public announcement that he had changed his mind in a publicly accessible place, such as as a new top-level post on ACX. Eliezer could quote-tweet his old tweets with a link to your rebuttal, and thank you for pointing out his bias.
(Let’s not forget James Damore’s memo, who cited research on greater female neuroticism as a justification for ignoring women’s issues with their workplace.)
I don’t think that’s true, and if anything it looks to be the opposite. Original document; the relevant quotes about neuroticism and what to do about it seem to be:
Personality differences
Neuroticism (higher anxiety, lower stress tolerance). ○ This may contribute to the higher levels of anxiety women report on Googlegeist and to the lower number of women in high stress jobs.
[...]
Non-discriminatory ways to reduce the gender gap
[...]
Women on average are more prone to anxiety
Make tech and leadership less stressful. Google already partly does this with its many stress reduction courses and benefits.
Maybe I’m misunderstanding how the Googlegeist works.
At my workplace, we regularly have surveys where we get asked about how we feel about various things. But if we report negative feelings, we get asked for suggestions about what is wrong.
The way I had imagined the situation is, someone working with the Googlegeist had noticed that a lot of women reported anxiety or whatever, and had decided they need to work with women to figure out what’s going on here, to solve it. And then James Damore felt that this was one instance of people looking at a disparity and claiming injustice, and that since he finds it biologically inevitable that women would be anxious, this shouldn’t be treated as indicative of an external problem, but instead should be medicalized and treated psychologically (or psychiatrically?).
But I admit I haven’t looked much into it so maybe the above model is wrong; originally when the Damore memo came out, I supported him, and it’s only later I’ve been thinking that maybe I shouldn’t have supported him. But I haven’t had much chance to talk with people about it.
The way I had imagined the situation is, someone working with the Googlegeist had noticed that a lot of women reported anxiety or whatever, and had decided they need to work with women to figure out what’s going on here, to solve it. And then James Damore felt that this was one instance of people looking at a disparity and claiming injustice, and that since he finds it biologically inevitable that women would be anxious, this shouldn’t be treated as indicative of an external problem, but instead should be medicalized and treated psychologically (or psychiatrically?). [italics added]
As a side note, I consider the italicized part a rather weighty accusation. I think one should therefore be careful about making such an accusation. I guess, in this case, you were just honestly reporting the contents of your brain on the matter, not necessarily making an accusation.
Still, I think this to some extent illustrates an epistemic environment where it’s normal to throw around damaging accusations whose truth value is somewhere between “extremely uncharitable interpretation” and “objectively false”. Precisely the type that got Damore fired, in other words. Do we have such an environment even among rationalists? That is at the heart of Zack’s adventure.
(Incidentally, imagine if Damore had claimed the opposite—”Women are less prone to anxiety and can handle stress more easily.” Wouldn’t that also lead to accusations that Damore was saying we can ignore women’s problems?)
Anyway, on to object level. I think Damore’s point, in bringing it up, was that the stress in (some portion of) tech jobs may be a reason there are fewer women than men in tech. Reasons to think this:
The title of the super-section containing the “neuroticism” quote is “Possible non-bias causes of the gender gap in tech”.
The super-section is preceded by “For the rest of this document, I’ll concentrate on the extreme stance that all differences in outcome are due to differential treatment [italics added] and the authoritarian element that’s required to actually discriminate to create equal representation.”
The last sentence in the section (“Personality differences”) is “We need to stop assuming that gender gaps imply sexism.”
As already quoted, he says that the anxiety thing implies that “Mak[ing] tech and leadership less stressful” would be a “non-discriminatory way to reduce the gender gap”.
If Damore had said “Here are some issues women reported; and we should discount these reports because women are extra-anxious”, then your model would be well-founded. I don’t see him saying anything like that in the document, though. In the whole document, Damore doesn’t mention anything reported by women on Googlegeist, other than the anxiety thing. (I would be surprised if he, being an engineer and not in HR or leadership, had access to the arbitrary text field submissions from the other employees; I would guess he saw aggregated results on numerical questions, plus any items leadership chose to share with everyone.) Googlegeist itself is mentioned only two other times in the document; both times it’s him suggesting something be done with future Googlegeist surveys.
He does mention another item as a (primarily) women’s issue, although the source is a 2006 paper rather than Googlegeist. Again, he does advocate doing something about it (with caveats):
Non-discriminatory ways to reduce the gender gap
[...]
Women on average look for more work-life balance while men have a higher drive for status on average ○ Unfortunately, as long as tech and leadership remain high status, lucrative careers, men may disproportionately want to be in them. Allowing and truly endorsing (as part of our culture) part time work though can keep more women in tech.
Now, at the end, he says this:
Philosophically, I don’t think we should do arbitrary social engineering of tech just to make it appealing to equal portions of both men and women. For each of these changes, we need principled reasons for why it helps Google; that is, we should be optimizing for Google—with Google’s diversity being a component of that. For example, currently those willing to work extra hours or take extra stress will inevitably get ahead and if we try to change that too much, it may have disastrous consequences. Also, when considering the costs and benefits, we should keep in mind that Google’s funding is finite so its allocation is more zero-sum than is generally acknowledged.
The most uncharitable reader could say “Aha, so he’s laid the groundwork to not follow through with anything that actually helps women, keeping the status quo, and everything he’s said before is just a trick.” If the reader comes in with that kind of implicit assumption about Damore’s character, then they’ll probably stick with it; all I can say is, evidence for such a belief does not come from the document. (Incidentally, I’ve met Damore at a party; I read him as a well-meaning nerd, who thought that if he made a sufficiently comprehensive, careful, well-cited, and constructively oriented writeup, he could cut through the hostility and they’d work out some solutions that would make everyone happier. The result is really tragic in that light.)
I think, to come up with your conclusion, you have to do a lot of reading into the text, and a lot of not reading the actual text. Which, I think, was par for the course for most negative takes on Damore. I am surprised and somewhat perturbed by your report that you originally supported Damore, and wonder what happened since then. Perhaps memory faded and “osmosis” brought in others’ takes?
In more detail, my background is I used to subscribe to research into psychological differences between the sexes and the races, with a major influence in my views being Scott Alexander (though there’s also a whole backstory to how I got into this).
I eventually started doing my own empirical research into transgender topics, and found Blanchardianism/autogynephilia theory to give the strongest effect sizes.
And as I was doing this, I was learning more about how to perform this sort of research; psychometrics, causal inference, psychology, etc.. Over time, I got a feeling for what sorts of research questions are fruitful, what sort of methods and critiques are valid, and what sorts of dynamics and distinctions should be paid attention to.
These sorts of people are not very interested in actually developing substantive theory or testing their claims in strong ways which might disprove them.
Instead they are mainly interested in providing a counternarrative to progressive theories.
They often use superficial or invalid psychometric methods.
They often make insinuations that they have some deep theory or deep studies, but really actually don’t.
So yes, I am bringing priors from outside of this. I’ve been at the heart of the supposed science into these things, and I have become horrified at what I once trusted.
Onto your points:
I think Damore’s point, in bringing it up, was that the stress in (some portion of) tech jobs may be a reason there are fewer women than men in tech.
You may or may not be right that this is what he meant.
(I think it’s a completely wrong position, because the sex difference in neuroticism is much smaller (by something like 2x) than the sex difference in tech interests and tech abilities, and presumably the selection effect for neuroticism on career field is also much smaller than that of interests. So I’m not sure your reading on it is particularly more charitable, only uncharitable in a different direction; assuming a mistake rather than a conflict.)
… I don’t think this changes the point that it assumes the measured sex difference in Neuroticism is a causative agent in promoting sex differences in stress, rather than addressing the possibility that the increased Neuroticism may reflect additional problems women are facing?
(Incidentally, imagine if Damore had claimed the opposite—”Women are less prone to anxiety and can handle stress more easily.” Wouldn’t that also lead to accusations that Damore was saying we can ignore women’s problems?)
The correct thing to claim is “We should investigate what people are anxious/stressed about”. Jumping to conclusions that people’s states are simply a reflection of their innate traits is the problem.
As a side note, I consider the italicized part a rather weighty accusation. I think one should therefore be careful about making such an accusation. I guess, in this case, you were just honestly reporting the contents of your brain on the matter, not necessarily making an accusation.
Still, I think this to some extent illustrates an epistemic environment where it’s normal to throw around damaging accusations whose truth value is somewhere between “extremely uncharitable interpretation” and “objectively false”. Precisely the type that got Damore fired, in other words. Do we have such an environment even among rationalists? That is at the heart of Zack’s adventure.
I don’t think this is at the heart of Zack’s adventure? Zack’s issues were mainly about leading rationalists jumping in to rationalize things in the name of avoiding conflicts.
Anyway, making weighty claims about people is core to what differential psychology is about. It’s possible that some of my claims about Damore are false, in which case we should discuss that and fix the mistakes. However, the position that one should just keep quiet about claims about people simply because they are weighty would also seem to imply that we should keep quiet about claims about trans people and masculinity/femininity, or race and IQ, or, to make the Damore letter more relevant, men/women and various traits related to performance in tech.
Incidentally, I’ve met Damore at a party; I read him as a well-meaning nerd, who thought that if he made a sufficiently comprehensive, careful, well-cited, and constructively oriented writeup, he could cut through the hostility and they’d work out some solutions that would make everyone happier. The result is really tragic in that light.
Somewhat possible this is true. I think nerdy communities like LessWrong should do a better job at communicating the problems with various differential psychology findings and communicating how they are often made by conservatives to promote an agenda. If they did this, perhaps Damore would not have been in this situation.
But also, if I take my personal story as a template, then it’s probably more complicated than that. Yes, I had lots of time where I was a well-meaning nerd and I got tricked by conservative differential psychology. But a big part of the reason I got into this differential psychology in the first place was a distrust of feminism. If I had done my due diligence, or if nerdy groups had been better at communicating the problems with these areas, it might not have become as much of a problem.
These sorts of people are not very interested in actually developing substantive theory or testing their claims in strong ways which might disprove them.
Instead they are mainly interested in providing a counternarrative to progressive theories.
They often use superficial or invalid psychometric methods.
They often make insinuations that they have some deep theory or deep studies, but really actually don’t.
These things are bad, but, apart from point 2, I would ask: how do they compare to the average quality of social science research? Do you have high standards, or do you just have high standards for one group? I think most of us spend at least some time in environments where the incentive gradients point towards the latter. Beware isolated demands for rigor.
Research quality being what it is, I would recommend against giving absolute trust to anyone, even if they appear to have earned it. If there’s a result you really care about, it’s good to pick at least one study and dig into exactly what they did, and to see if there are other replications; and the prior probability of “fraud” probably shouldn’t go below 1%.
As for point 2—if you were a researcher with heretical opinions, determined to publish research on at least some of them, what would you do? It seems like a reasonable strategy is to pick something heretical that you’re confident you can defend, and do a rock-solid study on it, and brace for impact. Is it still the case that disproving the blank-slate hypothesis would constitute progress in some academic subfields? If so, then expect people to continue trying it.
The study says there was “a meta-analysis concluding that small monetary incentives could improve test scores by 0.64 SDs” (roughly 10 IQ points); looks to be Duckworth et all 2011. The guy says it seemed sketchy—the studies had small N, weird conditions, and/or fraudulent researchers. Looking at table S1 from Duckworth, indeed, N is <100 on most of the studies; “Bruening and Zella (1978)” sticks out as having a large effect size and a large N, and, when I google for more info about that, I find that Bruening was convicted by an NIMH panel of scientific fraud. Checks out so far.
The guy ran a series of studies, the last of which offered incentives of nil, £2, and £5-£10 for test performance, with the smallest subgroup being N=150, taken from the adult population via “prolific academic”. He found that £2 and £5-£10 had similar effects, those being apparently 0.2 SD and 0.15 SD respectively, which would be 3 IQ points or a little less. (Were the “small monetary incentives” from Duckworth of that size? The Duckworth table shows most of the studies as being in the $1-$9 or <$1 range; looks like yes.) So, at least as a “We suspected these results were bogus, tried to reproduce them, and got a much smaller effect size”, this seems all in order.
Now, you say:
IQ test effort correlates with IQ scores, and they investigate whether it is causal using incentives. However, as far as I can tell, their data analysis is flawed, and when performed correctly the conclusion reverses.
[...] Incentives increase effort, but they only have marginal effects on performance. Does this show that effort doesn’t matter? No, because incentives also turn out to only have marginal effects on effort! Surely if you only improve effort a bit, you wouldn’t expect to have much influence on scores. We can solve this by a technique called instrumental variables. Basically, we divide the effect of incentives on scores by the effect of incentives on effort.
Your analysis essentially proposes that, if there were some method of increasing effort by 3-4x as much as he managed to increase it, then maybe you could in fact increase IQ scores by 10 points. This assumes that the effort-to-performance causation would stay constant as you step outside the tested range. That’s possible, but… I’m quite confident there’s a limit to how much “effort” can increase your results on a timed multiple-choice test, that you’ll hit diminishing marginal returns at some point (probably even negative marginal returns, if the incentive is strong enough to make many test-takers nervous), and extrapolating 3-4x outside the achieved effect seems dubious. (I also note that the 1x effect here means increasing your self-evaluated effort from 4.13 to 4.28 on a scale that goes up to 5, so a 4x effect would mean going to 4.73, approaching the limits of the scale itself.)
You say, doing your analysis:
For study 2, I get an effect of 0.54. For study 3, I get an effect of 0.37. For study 4, I get an effect of 0.39. The numbers are noisy for various reasons, but this all seems to be of a similar order of magnitude to the correlation in the general population, so this suggests the correlation between IQ and test effort is due to a causal effect of test effort increasing IQ scores.
That is interesting… Though the correlation between test effort and test performance in the studies is given as 0.27 and 0.29 in different samples, so, noise notwithstanding, your effects are consistently larger by a decent margin. That would suggest that there’s something else going on than the simple causation.
The authors say:
6.1. Correlation and direction of causality
Across all three samples and cognitive ability tests (sentence verification, vocabulary, visual-spatial reasoning), the magnitude of the association between effort and test performance was approximately 0.30, suggesting that higher levels of motivation are associated better levels of test performance. Our results are in close accord with existing literature [...]
As is well-known, the observation of a correlation is a necessary but not sufficient condition for causality. The failure to observe concomitant increases in test effort and test performance, when test effort is manipulated, suggests the absence of a causal effect between test motivation and test performance.
That last sentence is odd, since there was in fact an increase in both test effort and test performance. Perhaps they’re equivocating between “low effect” and “no effect”? (Which is partly defensible in that the effect was not statistically significant in most of the studies they ran. I’d still count it as a mark against them.) The authors continue:
Consequently, the positive linear assocation between effort and performance may be considered either spurious or the direction of causation reversed – flowing from ability to motivation. Several investigations have shown that the correlation between test-taking anxiety and test performance likely flows from ability to test-anxiety, not the other way around (Sommer & Arendasy, 2015; Sommer, Arendasy, Punter, Feldhammer-Kahr, & Rieder, 2019). Thus, if the direction of causation flows from ability to test motivation, it would help explain why effort is so difficult to shift via incentive manipulation.
6.2. Limitations & future research
We acknowledge that the evidence for the causal direction between effort and ability remains equivocal, as our evidence rests upon the absence of evidence (absence of experimental incentive effect). Ideally, positive evidence would be provided. Indirect positive evidence may be obtained by conducting an experiment, whereby half the subjects are given a relatively easy version of the paper folding task (10 easiest items) and the other half are given a relatively more difficult version (10 most difficult items). It is hypothesized that those given the relatively easier version of the paper folding task would then, on average, self-report greater levels of test-taking effort. Partial support for such a hypothesis is apparent in Table 1 of this investigation. Specifically, it can be seen that there is a perfect correspondence between the difficulty of the test (synonyms mean 73.4% correct; sentence verification mean 53.8% correct; paper folding mean 43.3%) and the mean level of reported effort (synonyms mean effort 4.42; sentence verification mean 4.11; paper folding mean 3.83).
That is a pretty interesting piece of evidence for the “ability leads to self-reported effort” theory.
Overall… The study seems to be a good one: doing a large replication study on prior claims. The presentation of it… The author on Twitter said “testing over N= 4,000 people”, which is maybe what you get if you add up the N from all the different studies, but each study is considerably smaller; I found that somewhat misleading, but suspect that’s a common thing when authors report multiple studies at once. On Twitter he says “We conclude that effort has unequivocally small effects”, which omits caveats like “our results are accurate to the degree that alternative incentives do not yield appreciably larger effects” which are in the paper; this also seems like par for the course for science journalism (not to mention Twitter discourse). And they seem to have equivocated in places between “low effect” and “no effect”. (Which I suspect is also not rare, unfortunately.)
Now. You presented this as:
Here’s a classical example; an IQ researcher who is so focused on providing a counternarrative to motivational theories that he uses methods which are heavily downwards biased to “prove” that IQ test scores don’t depend on effort.
The “focused on providing a counternarrative” part is plausibly correct. However, the “uses methods which are heavily downwards biased to “prove” [...]” is not. The “downwards biased methods” are “offering a monetary incentive of £2-£10, which turned out to be insufficient to change effort much”. The authors were doing a replication of Duckworth, in which most of the cited studies had a monetary incentive of <$10—so that part is correctly matched—and they used high enough N that Duckworth’s claimed effect size should have shown up easily. They also preregistered the first of their incentive-based studies (with the £2 incentive), and the later ones were the same but with increased sample size, then increased incentive. In other words, they did exactly what they should have done in a replication. To claim that they chose downwards-biased methods for the purpose of proving their point seems quite unfair; those methods were chosen by Duckworth.
This seems to be a data point of the form “your priors led you to assume bad faith (without having looked deeply enough to discover this was unjustified), which then led you to take this as a case to justify those priors for future cases”. (We will see more of these later.) Clearly this could be a self-reinforcing loop that, over time, could lead one’s priors very far astray. I would hope anyone who posts here would recognize the danger of such a trap.
Second example. “Simon Baron-Cohen playing Motte-Bailey with the “extreme male brain” theory of autism.” Let’s see… It seems uncontroversial (among the participants in this discussion) that there are dimensions on which male and female brains differ (on average), and on which autists are (on average) skewed towards the male side, and that this includes the empathizing and systematizing dimensions.
You quote Baron-Cohen as saying “According to the ‘extreme male brain’ theory of autism, people with autism or AS should always fall in the [extreme systematizing range]”, and say that this is obviously false, since there exist autists who are not extreme systematizers—citing a later study coauthored by Baron-Cohen himself, which puts only ~10% of autists into the “Extreme Type S” category. You say he’s engaging in a motte-and-bailey.
After some reading, this looks to me like a case of “All models are wrong, but some are useful.” The same study says “Finally, we demonstrate that D-scores (difference between EQ and SQ) account for 19 times more of the variance in autistic traits (43%) than do other demographic variables including sex. Our results provide robust evidence in support of both the E-S and EMB theories.” So, clearly he’s aware that 57% of the variance is not explained by empathizing-systematizing. I think it would be reasonable to cast him as saying “We know this theory is not exactly correct, but it makes some correct predictions.” Indeed, he counts the predictions made by these theories:
An extension of the E-S theory is the Extreme Male Brain (EMB) theory (11). This proposes that, with regard to empathy and systemizing, autistic individuals are on average shifted toward a more “masculine” brain type (difficulties in empathy and at least average aptitude in systemizing) (11). This may explain why between two to three times more males than females are diagnosed as autistic (12, 13). The EMB makes four further predictions: (vii) that more autistic than typical people will have an Extreme Type S brain; (viii) that autistic traits are better predicted by D-score than by sex; (ix) that males on average will have a higher number of autistic traits than will females; and (x) that those working in science, technology, engineering, and math (STEM) will have a higher number of autistic traits than those working in non-STEM occupations.
Note also that he states the definition of EMB theory as saying “autistic individuals are on average shifted toward a more “masculine” brain type”. You say “Sometimes EMB proponents say that this isn’t really what the EMB theory says. Instead, they make up some weaker predictions, that the theory merely asserts differences “on average”.” This is Baron-Cohen himself defining it that way.
Would it be better if he used a word other than “theory”? “Model”? You somewhat facetiously propose “If the EMB theory had instead been named the “sometimes autistic people are kinda nerdy” theory, then it would be a lot more justified by the evidence”. How about, say, the theory that “There are processes that masculinize the brain in males; and some of those processes going into overdrive is a thing that causes autism”? (Which was part of the original paper: “What causes this shift remains unclear, but candidate factors include both genetic differences and prenatal testosterone.”) That is, in fact, approximately what I found when I googled for people talking about the EMB theory—and note that the article is critical of the theory:
This hypothesis, called the ‘extreme male brain’ theory, postulates that males are at higher risk for autism as a result of in-utero exposure to steroid hormones called androgens. This exposure, the theory goes, accentuates the male-like tendency to recognize patterns in the world (systemizing behavior) and diminishes the female-like capacity to perceive social cues (socializing behavior). Put simply, boys are already part way along the spectrum, and if they are exposed to excessive androgens in the womb, these hormones can push them into the diagnostic range.
That is the sense in which an autistic brain is, hypothetically, an “extreme male brain”. I guess “extremely masculinized brain” would be a bit more descriptive to someone who doesn’t know the context.
The problem with a motte-and-bailey is that someone gets to go around advancing an extreme position, and then, when challenged by someone who would disprove it, he avoids the consequences by claiming he never said that, he only meant the mundane position. According to you, the bailey is “they want to talk big about how empathizing-systematizing is the explanation for autism”. According to the paper, it was 43% of the explanation for autism, and the biggest individual factor? Seems pretty good.
Has Baron-Cohen gone around convincing people that empathizing-systematizing is the only factor involved in autism? I suspect that he doesn’t believe it, he didn’t mean to claim it, almost no one (except you) understood him as claiming it, and pretty much no one believes it. Maybe he picked a suboptimal name, which lent itself to misinterpretation. Do you have examples of Baron-Cohen making claims of that kind, which aren’t explainable as him taking the “This theory is not exactly correct, but it makes useful predictions” approach?
The context here is explaining why you’ve “become horrified at what [you] once trusted”, which you now call “supposed science”. I’m… underwhelmed by what I’ve seen.
Back to Damore...
I think Damore’s point, in bringing it up, was that the stress in (some portion of) tech jobs may be a reason there are fewer women than men in tech.
You may or may not be right that this is what he meant.
...I thought it was overkill to cite four quotes on that issue, but apparently not. Such priors!
(I think it’s a completely wrong position, because the sex difference in neuroticism is much smaller (by something like 2x) than the sex difference in tech interests and tech abilities, and presumably the selection effect for neuroticism on career field is also much smaller than that of interests. So I’m not sure your reading on it is particularly more charitable, only uncharitable in a different direction; assuming a mistake rather than a conflict.)
It seems you’re saying Damore mentions A but not B, and B is bigger, therefore Damore’s “comprehensive” writeup is not so, and this omission is possibly ill-motivated. But, erm, Damore does mention B, twice:
[Women, on average have more] Openness directed towards feelings and aesthetics rather than ideas. Women generally also have a stronger interest in people rather than things, relative to men (also interpreted as empathizing vs. systemizing). ○ These two differences in part explain why women relatively prefer jobs in social or artistic areas. More men may like coding because it requires systemizing and even within SWEs, comparatively more women work on front end, which deals with both people and aesthetics.
[...]
Women on average show a higher interest in people and men in things ○ We can make software engineering more people-oriented with pair programming and more collaboration. Unfortunately, there may be limits to how people-oriented certain roles at Google can be and we shouldn’t deceive ourselves or students into thinking otherwise (some of our programs to get female students into coding might be doing this).
This suggests that casting aspersions on Damore’s motives is not gated by “Maybe I should double-check what he said to see if this is unfair”.
I think the anxiety/stress thing is more relevant for top executive roles than for engineer roles; a population-level difference is more important at the extremes. Damore does talk about leadership specifically:
We always ask why we don’t see women in top leadership positions, but we never ask why we see so many men in these jobs. These positions often require long, stressful hours that may not be worth it if you want a balanced and fulfilling life.
Next:
(Incidentally, imagine if Damore had claimed the opposite—”Women are less prone to anxiety and can handle stress more easily.” Wouldn’t that also lead to accusations that Damore was saying we can ignore women’s problems?)
The correct thing to claim is “We should investigate what people are anxious/stressed about”. Jumping to conclusions that people’s states are simply a reflection of their innate traits is the problem.
Well, he lists one source of stress above, and he does recommend to “Make tech and leadership less stressful”.
I don’t think this is at the heart of Zack’s adventure? Zack’s issues were mainly about leading rationalists jumping in to rationalize things in the name of avoiding conflicts.
And why would these rationalists care so much about avoiding these conflicts, to the point of compromising the intellectual integrity that seems so dear to them? Fear that they’d face the kind of hostility and career-ruining accusations directed at Damore, and things downstream of fears like that, seems like a top candidate explanation.
Anyway, making weighty claims about people is core to what differential psychology is about.
Um. Accusations are things you make about individuals, occasionally organizations. I hope that the majority of differential psychology papers don’t consist of “Bob Jones has done XYZ bad thing”.
It’s possible that some of my claims about Damore are false, in which case we should discuss that and fix the mistakes. However, the position that one should just keep quiet about claims about people simply because they are weighty would also seem to imply that we should keep quiet about claims about trans people and masculinity/femininity, or race and IQ, or, to make the Damore letter more relevant, men/women and various traits related to performance in tech.
You are equivocating between reckless claims of misconduct / malice by an individual, and heavily cited claims about population-level averages that are meant to inform company policy. Are you seriously stating an ethical principle that anyone who makes the latter should expect to face the former and it’s justified?
Somewhat possible this is true. I think nerdy communities like LessWrong should do a better job at communicating the problems with various differential psychology findings and communicating how they are often made by conservatives to promote an agenda. If they did this, perhaps Damore would not have been in this situation.
I think Damore was aware that there are people who use population-level differences to justify discriminating against individuals, and that’s why he took pains to disavow that. As for “the problems with various differential psychology findings”—do you think that some substantial fraction, say at least 20%, of the findings he cited were false?
Second example. “Simon Baron-Cohen playing Motte-Bailey with the “extreme male brain” theory of autism.” Let’s see… It seems uncontroversial (among the participants in this discussion) that there are dimensions on which male and female brains differ (on average), and on which autists are (on average) skewed towards the male side, and that this includes the empathizing and systematizing dimensions.
Quick update!
I found that OpenPsychometrics has a dataset for the EQ/SQ tests. Unfortunately, there seems to be a problem for the data with the EQ items, but I just ran a factor analysis for the SQ items to take a closer look at your claims here.
There appeared to be 3 or 4 factors underlying the correlations on the SQ test, which I’d roughly call “Technical interests”, “Nature interests”, “Social difficulties” and “Jockyness”. I grabbed the top loading items for each of the factors, and got this correlation matrix:
The correlations between the technical interests and nature interests plausibly reflects the notion that Systematizing is a thing, though I suspect that it could also be found to correlate with all sorts of other things that would not be considered Systematizing? Like non-Systematizing ways of interacting with nature. Idk though.
The sex differences in the items was limited to the technical interests, rather than than also covering the nature interests. This does not fit a simple model of a sex difference in general Systematizing, but it does fit a model where the items are biased towards men but there is not much sex difference in general Systematizing.
I would be inclined to think that the Social difficulties items correlate negatively with Empathizing Quotient or positively with Autism Spectrum Quotient. If we are interested in the correlations between general Systematizing and these other factors, then this could bias the comparisons. On the other hand, the Social difficulties items were not very strongly correlated with the overall SQ score, so maybe not.
I can’t immediately think of any comments for the Jockyness items.
Overall, I strongly respect the fact that he made many of the items very concrete, but I now also feel like I have proven that the gender differences on Systematizing to be driven by psychometric shenanigans, and I strongly expect to find that many of the other associations are also driven by psychometric shenanigans.
I’ve sent an email asking OpenPsychometrics to export the Empathizing Quotient items too. If he does so, I hope to write a top-level post explaining my issues with the psychometrics here.
Hm, actually I semi-retract this; the OpenPsychometrics data seems to be based on the original Systematizing Quotient, whereas there seems to be a newer one called Systematizing Quotient-Revised, which is supposedly more gender-neutral. Not sure where I can get data on this, though. Will go looking.
Edit: Like I am still pretty suspicious about the SQ-R. I just don’t have explicit proof that it is flawed.
Oops, upon reading more about the SQ, I should correct myself:
Some of the items, such as S16, are “filler items” which are not counted as part of the score; these are disproportionately part of the “Social difficulties” and “Jockyness” factors, so that probably reduces the amount of bias that can be introduced by those items, and it also also explains why they don’t correlate very much with the overall SQ scores.
But some of the items for these factors, such as S31, are not filler items, and instead get counted for the test, presumably because they have cross-loadings on the Systematizing factor. So the induced bias is probably not zero.
If I get the data from OpenPsychometrics, I will investigate in more detail.
Since I don’t have data on the EQ, here’s a study where someone else worked with it. They found that the EQ had three factors, which they named “Cognitive Empathy”, “Emotional Empathy” and “Social Skills”. The male-female difference was driven by “Emotional Empathy” (d=1), whereas the autistic-allistic difference was driven by “Social Skills” (d=1.3). The converse differences were much smaller, 0.24 and 0.66. As such, it seems likely that the EQ lumps together two different kinds of “empathizing”, one of which is feminine and one of which is allistic.
As for point 2—if you were a researcher with heretical opinions, determined to publish research on at least some of them, what would you do? It seems like a reasonable strategy is to pick something heretical that you’re confident you can defend, and do a rock-solid study on it, and brace for impact. Is it still the case that disproving the blank-slate hypothesis would constitute progress in some academic subfields? If so, then expect people to continue trying it.
I should also say, in the context of IQ and effort, some of the true dispute is about whether effort differences can explain race differences in scores. And for that purpose, what I would do is to go more directly into that.
In fact, I have done so. Quoting some discussion I had on Discord:
Me: Oh look at this thing I just saw
(correlation matrix with 0 correlation between race and test effort highlighted)
Other person: That is a really good find. Where’s it from?
Me: from the supplementary info to one of the infamous test motivation studies:
Me: Despite implying that test motivation explains racial gaps in the study text:
On the other hand, test motivation may be a serious confound instudies including participants who are below-average in IQ and wholack external incentives to perform at their maximal potential.Consider, for example, the National Longitudinal Survey of Youth(NLSY), a nationally representative sample of more than 12,000adolescents who completed an intelligence test called the ArmedForces Qualifying Test (AFQT). As is typical in social science re-search, NLSY participants were not rewarded in any way for higherscores. The NLSY data were analyzed inThe Bell Curve,inwhichHerrnstein and Murray (44) summarily dismissed test motivation asa potential confound in their analysis of black–white IQ disparities.
(This was way after I became critical of differential psychology btw. Around 2 months ago.)
These things are bad, but, apart from point 2, I would ask: how do they compare to the average quality of social science research? Do you have high standards, or do you just have high standards for one group? I think most of us spend at least some time in environments where the incentive gradients point towards the latter. Beware isolated demands for rigor.
I don’t know for sure as I am only familiar with certain subsets of social science, but a lot of it is in fact bad. I also often criticize normal social science, but in this context it was this specific area of social science that came up.
As for point 2—if you were a researcher with heretical opinions, determined to publish research on at least some of them, what would you do? It seems like a reasonable strategy is to pick something heretical that you’re confident you can defend, and do a rock-solid study on it, and brace for impact. Is it still the case that disproving the blank-slate hypothesis would constitute progress in some academic subfields? If so, then expect people to continue trying it.
I would try to perform studies that yield much more detailed information. For instance, mixed qualitative and quantitative studies where one qualitatively inspects the data points that are above-average or below-average for the regressions, to see whether there are identifiable missing factors.
So, at least as a “We suspected these results were bogus, tried to reproduce them, and got a much smaller effect size”, this seems all in order.
If he had phrased his results purely as disproving the importance of incentives, rather than effort, I think it would have been fine.
Your analysis essentially proposes that, if there were some method of increasing effort by 3-4x as much as he managed to increase it, then maybe you could in fact increase IQ scores by 10 points. This assumes that the effort-to-performance causation would stay constant as you step outside the tested range. That’s possible, but… I’m quite confident there’s a limit to how much “effort” can increase your results on a timed multiple-choice test, that you’ll hit diminishing marginal returns at some point (probably even negative marginal returns, if the incentive is strong enough to make many test-takers nervous), and extrapolating 3-4x outside the achieved effect seems dubious. (I also note that the 1x effect here means increasing your self-evaluated effort from 4.13 to 4.28 on a scale that goes up to 5, so a 4x effect would mean going to 4.73, approaching the limits of the scale itself.)
I prefer to think of it as “if you increase your effort from being one of the lowest-effort people to being one of the highest-effort people, you can increase your IQ score by 17 IQ points”. This doesn’t seem too implausible to me, though admittedly I’m not 100% sure what the lowest-effort people are doing.
It’s valid to say that extrapolating outside of the tested range is dubious, but IMO this means that the study design is bad.
I think it’s likely that the limited returns to effort would be reflected in the limited bounds of the scale. So I don’t think my position is in tension with the intuition that there’s limits on what effort can do for you. Under this model, it is also worth noting that the effort scores were negatively skewed, so this implies that lack of effort is a bigger cause of low scores than extraordinary effort is of high scores.
That is interesting… Though the correlation between test effort and test performance in the studies is given as 0.27 and 0.29 in different samples, so, noise notwithstanding, your effects are consistently larger by a decent margin. That would suggest that there’s something else going on than the simple causation.
I don’t think my results are statistically significantly different from 0.3ish; in the ensuing discussion, people pointed out that the IV results had huge error bounds (because the original study was only barely significant).
But also if there is measurement error in the instrument (effort), then that would induce an upwards bias in the IV estimated effect. So that might also contribute.
However, the “uses methods which are heavily downwards biased to “prove” [...]” is not. The “downwards biased methods” are “offering a monetary incentive of £2-£10, which turned out to be insufficient to change effort much”. The authors were doing a replication of Duckworth, in which most of the cited studies had a monetary incentive of <$10—so that part is correctly matched—and they used high enough N that Duckworth’s claimed effect size should have shown up easily. They also preregistered the first of their incentive-based studies (with the £2 incentive), and the later ones were the same but with increased sample size, then increased incentive. In other words, they did exactly what they should have done in a replication. To claim that they chose downwards-biased methods for the purpose of proving their point seems quite unfair; those methods were chosen by Duckworth.
Shitty replications of shitty environmentalist research is still shitty.
Like this sort of thing makes sense to do as a personal dispute between the researchers, but for all of us who’d hope to actually use or build on the research for substantial purposes, it’s no good if the researchers use shitty methods because they are trying to build a counternarrative against other researchers using shitty methods.
Let’s see… It seems uncontroversial (among the participants in this discussion) that there are dimensions on which male and female brains differ (on average), and on which autists are (on average) skewed towards the male side, and that this includes the empathizing and systematizing dimensions.
I wouldn’t confidently disagree with this, but I do have some philosophical nitpicks/uncertainties.
(“Brain” connotes neurology to me, yet I am not sure if empathizing and especially systematizing are meaningful variables on a neurological level. I would also need to double-check whether EQ/SQ are MI for sex and autism because I don’t remember whether they are. I suspect in particular the EQ is not, and it is the biggest drive of the EQ/SQ-autism connection, so it is pretty important to consider. But for the purposes of the Motte-Bailey situation, we can ignore that. Just tagging it as a potential area of disagreement.)
Would it be better if he used a word other than “theory”? “Model”? You somewhat facetiously propose “If the EMB theory had instead been named the “sometimes autistic people are kinda nerdy” theory, then it would be a lot more justified by the evidence”. How about, say, the theory that “There are processes that masculinize the brain in males; and some of those processes going into overdrive is a thing that causes autism”? (Which was part of the original paper: “What causes this
shift remains unclear, but candidate factors include both genetic differences and prenatal testosterone.”)
I think what would be better would be if he clarified his models and reasoning. (Not positions, as that opens up the whole Motte-Bailey thing and also is kind of hard to engage with.) What is up with the original claim about autists always being extreme type S? Was this just a mistake that he would like to retract? If he only considers it to be a contributor that leads to half the variance, does he have any opinion on the nature of the other contributors to autism? Does he have any position on the relationship between autistic traits as measured by the AQ, and autism diagnosis? What should we make of the genetic contributors to autism being basically unrelated to the EQ/SQ? (And if the EQ/SQ are not MI for sex/autism, what does he make of that?)
Do you have examples of Baron-Cohen making claims of that kind, which aren’t explainable as him taking the “This theory is not exactly correct, but it makes useful predictions” approach?
This is part of the trouble, these areas do not have proper discussions.
It seems you’re saying Damore mentions A but not B, and B is bigger, therefore Damore’s “comprehensive” writeup is not so, and this omission is possibly ill-motivated.
...
This suggests that casting aspersions on Damore’s motives is not gated by “Maybe I should double-check what he said to see if this is unfair”.
No, I meant that under your interpretation, Damore mentions A when A is of negligible effect, and so that indicates a mistake. I didn’t mean to imply that he didn’t mention B, and I read this part of his memo multiple times prior to sending my original comment, so I was fully aware that he mentioned B.
Well, he lists one source of stress above, and he does recommend to “Make tech and leadership less stressful”.
But again the “Make tech and leadership less stressful” point boiled down to medicalizing it.
And why would these rationalists care so much about avoiding these conflicts, to the point of compromising the intellectual integrity that seems so dear to them? Fear that they’d face the kind of hostility and career-ruining accusations directed at Damore, and things downstream of fears like that, seems like a top candidate explanation.
Valid point.
Um. Accusations are things you make about individuals, occasionally organizations. I hope that the majority of differential psychology papers don’t consist of “Bob Jones has done XYZ bad thing”.
Differential psychology papers tend to propose ways to measure traits that they consider important, to extend preciously created measures with new claims of importance, and to rank demographics by importance.
You are equivocating between reckless claims of misconduct / malice by an individual, and heavily cited claims about population-level averages that are meant to inform company policy. Are you seriously stating an ethical principle that anyone who makes the latter should expect to face the former and it’s justified?
I think in an ideal world, the research and the discourse would be more rational. For people who are willing to discuss and think about these matters rationally, it seems inappropriate to accuse them of misconduct/malice simply for agreeing with them. However if people have spent a long time trying to bring up rational discussion and failed, then it is reasonable for these people to assume misconduct/malice.
I think Damore was aware that there are people who use population-level differences to justify discriminating against individuals, and that’s why he took pains to disavow that.
Using population-level differences to justify discriminating against individuals can be fine and is not what I have been objecting to.
As for “the problems with various differential psychology findings”—do you think that some substantial fraction, say at least 20%, of the findings he cited were false?
I don’t know. My problem with this sort of research typically isn’t that it is wrong (though it sometimes may be) but instead that it is of limited informative value.
I should probably do a top-level review post where I dig through all his cites to look at which parts of his memo are unjustified and which parts are wrong. I’ll tag you if I do that.
I think “psychological sex differences” ideology is energetically unable to acknowledge these sorts of things because its main purpose/motivation is to function as a counterstory against feminist ideology
I mean, I agree that this is obviously a thing, but I continue to maintain hope in the possibility of actually reasoning about sex differences in the physical universe, rather than being resigned to living in the world of warring narratives. I think the named effect sizes help? (I try to be clear about saying “d ≈ 0.6”, not “Men are from mars.”)
it’s rare for irrationality/dishonesty to be one-sided? In the example disputes I can think of off the top of my head, it’s usually both or neither.
I absolutely agree that it’s critical to recognize that it’s often both. (I’m less sure about how often it’s “neither”. What stops people from converging?)
I mean, I agree that this is obviously a thing, but I continue to maintain hope in the possibility of actually reasoning about sex differences in the physical universe, rather than being resigned to living in the world of warring narratives. I think the named effect sizes help? (I try to be clear about saying “d ≈ 0.6”, not “Men are from mars.”)
But what I mean is that the research programs such as people-things and similar that I have seen so far are not good enough, and are not attempting to become good enough sufficiently well that you can expect to just fund them and wait for results.
Not because it is fundamentally impossible, but because the challenges are not taken seriously enough.
I absolutely agree that it’s critical to recognize that it’s often both. (I’m less sure about how often it’s “neither”. What stops people from converging?)
I think if due to priors, the sides have reasons to distrust each other, but the distrust leads to ignoring each other instead of leading to conflict, both sides can remain honest and rational (rather than degenerating into dishonesty due to conflict) while not realizing that the other sides are honest and rational, and so end up not converging?
It’s an example of a trans activist who, when asked whether people who want to coordinate sex-based descrimination should be purged from the discourse and authoritative sources, was like “yeah they just sound like mean busybodies to me”. Admittedly I didn’t really go into detail (partly because I don’t really have any strong examples that I support and want to argue about), so we don’t really know whether Pervocracy supports it in all relevant cases.
Perhaps the archetypal psychological sex difference that people have argued about is “women are emotional, men are rational”.
After reading your previous post where you quoted Deirdre McCloskey’s memoir, I started reading the memoir a bit too, and it actually provides a neat example of this psychological sex difference.
There was a period where McCloskey’s wife got all emotional and started complaining about McCloskey spending too much money on the phone bill. McCloskey very rationally pointed out that it was very cheap compared to e.g. therapy or hobbies. Clean example of female irrationality, right?
Of course, if one looks at the extended context, the picture is very different; Deirdre had initially assured the wife that it was just crossdressing and nothing else, and they had agreed with each other to put the crossdressing into the background, but now Deirdre was seriously considering transitioning, yet insisting that things like beard shaving was just crossdressing and nothing more. Essentially, from the beginning McCloskey took rational conversation off the table due to expecting that the wife would leave if they talked with each other about McCloskey’s desire to be a woman.
In retrospect, it seems McCloskey’s wife was right to worry. But also, it illustrates how psychological sex differences discourse tend to abstract over the environmental constraints people exist within. It seems like in these sorts of cases, “psychological sex differences” can function as a tool of oppression or abuse, rather than a genuine attempt at describing the world.
I think “psychological sex differences” ideology is energetically unable to acknowledge these sorts of things because its main purpose/motivation is to function as a counterstory against feminist ideology, and so the point is not to accurately describe the world but instead to deny cultural factors. (Let’s not forget James Damore’s memo, who cited research on greater female neuroticism as a justification for ignoring women’s issues with their workplace.)
I think her tweet was intended to make a claim about trans women specifically, and that it was this sub-claim that she was banned for. For example, if she had asked for a list of the richest women, and someone had linked Elon Musk, I don’t think she would have gotten banned for responding “Men aren’t women tho.”.
So I guess one could say Murphy was using common language to express the fact-claim that even if members of the natural category of adult human males transition or become intent on transitioning, they are not, in fact, members of the natural category of adult human females.
I think trans women are female-typical with respect to Agreeableness and Neuroticism, though I don’t know why. I think HRT changes large-scale brain proportions to be either intermediate between male and female or all the way to the female end, though I don’t think this has anything to do with Agreeableness/Neuroticism.
Of course since you didn’t transition and went off HRT, this might not apply to you.
Yes but the coalition that supports the person-in-the-street opposes me in my dispute with Phil because my dispute with Phil looks superficially similar, so I oppose the coalition that supports the person-in-the-street.
Yes.
The intuition here is that Aumann-like reasoning implies something like averaging everyone’s opinions, and therefore if there are lots of people on one side, they would dominate in the averaging.
But actually, I think it is better to think of Aumann-like reasoning as adding together everyone’s opinions. More formally, if you imagine that everyone has observed different pieces of independent evidence, leading to different people having different updates of their opinions relative to the prior, then to get the Aumannian update you have to add up all the changes in log-odds.
Or alternatively, you are thinking of it as, “what is the probability that all of them have somehow become irrational/dishonest with respect to this subject?”. I think… it’s rare for irrationality/dishonesty to be one-sided? In the example disputes I can think of off the top of my head, it’s usually both or neither.
In theory one might think that there should be a negative correlation between the sides being dishonest due to collider bias; you only need one side to be dishonest in order to get a dispute, so it does seem kind of weird if dishonesty on both sides is correlated. But I think what happens is that disputes lead to conflicts, and conflicts lead to defensiveness, overreach and aggressiveness, and so a single dispute can spin out into an entire system of rationality falling apart. These disputes all arguably started generations ago.
So basically, my Aumann-like reasoning for persistent conflicts would go: The rationalist community is being irrational/dishonest about trans topics. There must be a reason for this; and indeed, it seems like a defense mechanism against the conservative/HBD coalition’s conflicts against trans people. But the conservative/HBD coalition is also often bad, irrational and dishonest! And you’ve been regularly trusting them and appealing to their findings in your posts, even though I think upon closer inspection there are lots of issues that you’d recognize. Ooops.
I guess this makes for a neat example of what I just mentioned.
Another strong example.
I mean, I can feel how Michael and Ben are hurting here. They are in open conflict with the rationalist leaders and are spending a lot of reputation on this.
But they are also opposing locally honest behavior in order to force the rationalist community to improve.
Yep. And I mean, once you’ve written up the explanation of what the error is like, it’s pretty cheap to correct.
Scott Alexander could cross out his “Categories” post and put a link to your response at the top, and write a brief public announcement that he had changed his mind in a publicly accessible place, such as as a new top-level post on ACX. Eliezer could quote-tweet his old tweets with a link to your rebuttal, and thank you for pointing out his bias.
I don’t think that’s true, and if anything it looks to be the opposite. Original document; the relevant quotes about neuroticism and what to do about it seem to be:
Maybe I’m misunderstanding how the Googlegeist works.
At my workplace, we regularly have surveys where we get asked about how we feel about various things. But if we report negative feelings, we get asked for suggestions about what is wrong.
The way I had imagined the situation is, someone working with the Googlegeist had noticed that a lot of women reported anxiety or whatever, and had decided they need to work with women to figure out what’s going on here, to solve it. And then James Damore felt that this was one instance of people looking at a disparity and claiming injustice, and that since he finds it biologically inevitable that women would be anxious, this shouldn’t be treated as indicative of an external problem, but instead should be medicalized and treated psychologically (or psychiatrically?).
But I admit I haven’t looked much into it so maybe the above model is wrong; originally when the Damore memo came out, I supported him, and it’s only later I’ve been thinking that maybe I shouldn’t have supported him. But I haven’t had much chance to talk with people about it.
Gotta go sleep.
As a side note, I consider the italicized part a rather weighty accusation. I think one should therefore be careful about making such an accusation. I guess, in this case, you were just honestly reporting the contents of your brain on the matter, not necessarily making an accusation.
Still, I think this to some extent illustrates an epistemic environment where it’s normal to throw around damaging accusations whose truth value is somewhere between “extremely uncharitable interpretation” and “objectively false”. Precisely the type that got Damore fired, in other words. Do we have such an environment even among rationalists? That is at the heart of Zack’s adventure.
(Incidentally, imagine if Damore had claimed the opposite—”Women are less prone to anxiety and can handle stress more easily.” Wouldn’t that also lead to accusations that Damore was saying we can ignore women’s problems?)
Anyway, on to object level. I think Damore’s point, in bringing it up, was that the stress in (some portion of) tech jobs may be a reason there are fewer women than men in tech. Reasons to think this:
The title of the super-section containing the “neuroticism” quote is “Possible non-bias causes of the gender gap in tech”.
The super-section is preceded by “For the rest of this document, I’ll concentrate on the extreme stance that all differences in outcome are due to differential treatment [italics added] and the authoritarian element that’s required to actually discriminate to create equal representation.”
The last sentence in the section (“Personality differences”) is “We need to stop assuming that gender gaps imply sexism.”
As already quoted, he says that the anxiety thing implies that “Mak[ing] tech and leadership less stressful” would be a “non-discriminatory way to reduce the gender gap”.
If Damore had said “Here are some issues women reported; and we should discount these reports because women are extra-anxious”, then your model would be well-founded. I don’t see him saying anything like that in the document, though. In the whole document, Damore doesn’t mention anything reported by women on Googlegeist, other than the anxiety thing. (I would be surprised if he, being an engineer and not in HR or leadership, had access to the arbitrary text field submissions from the other employees; I would guess he saw aggregated results on numerical questions, plus any items leadership chose to share with everyone.) Googlegeist itself is mentioned only two other times in the document; both times it’s him suggesting something be done with future Googlegeist surveys.
He does mention another item as a (primarily) women’s issue, although the source is a 2006 paper rather than Googlegeist. Again, he does advocate doing something about it (with caveats):
Now, at the end, he says this:
The most uncharitable reader could say “Aha, so he’s laid the groundwork to not follow through with anything that actually helps women, keeping the status quo, and everything he’s said before is just a trick.” If the reader comes in with that kind of implicit assumption about Damore’s character, then they’ll probably stick with it; all I can say is, evidence for such a belief does not come from the document. (Incidentally, I’ve met Damore at a party; I read him as a well-meaning nerd, who thought that if he made a sufficiently comprehensive, careful, well-cited, and constructively oriented writeup, he could cut through the hostility and they’d work out some solutions that would make everyone happier. The result is really tragic in that light.)
I think, to come up with your conclusion, you have to do a lot of reading into the text, and a lot of not reading the actual text. Which, I think, was par for the course for most negative takes on Damore. I am surprised and somewhat perturbed by your report that you originally supported Damore, and wonder what happened since then. Perhaps memory faded and “osmosis” brought in others’ takes?
In more detail, my background is I used to subscribe to research into psychological differences between the sexes and the races, with a major influence in my views being Scott Alexander (though there’s also a whole backstory to how I got into this).
I eventually started doing my own empirical research into transgender topics, and found Blanchardianism/autogynephilia theory to give the strongest effect sizes.
And as I was doing this, I was learning more about how to perform this sort of research; psychometrics, causal inference, psychology, etc.. Over time, I got a feeling for what sorts of research questions are fruitful, what sort of methods and critiques are valid, and what sorts of dynamics and distinctions should be paid attention to.
But I also started getting a feeling for how the researchers into differential psychology operate. Here’s a classical example; an IQ researcher who is so focused on providing a counternarrative to motivational theories that he uses methods which are heavily downwards biased to “prove” that IQ test scores don’t depend on effort. Or Simon Baron-Cohen playing Motte-Bailey with the “extreme male brain” theory of autism.
More abstractly, what I’ve generally noticed is:
These sorts of people are not very interested in actually developing substantive theory or testing their claims in strong ways which might disprove them.
Instead they are mainly interested in providing a counternarrative to progressive theories.
They often use superficial or invalid psychometric methods.
They often make insinuations that they have some deep theory or deep studies, but really actually don’t.
So yes, I am bringing priors from outside of this. I’ve been at the heart of the supposed science into these things, and I have become horrified at what I once trusted.
Onto your points:
You may or may not be right that this is what he meant.
(I think it’s a completely wrong position, because the sex difference in neuroticism is much smaller (by something like 2x) than the sex difference in tech interests and tech abilities, and presumably the selection effect for neuroticism on career field is also much smaller than that of interests. So I’m not sure your reading on it is particularly more charitable, only uncharitable in a different direction; assuming a mistake rather than a conflict.)
… I don’t think this changes the point that it assumes the measured sex difference in Neuroticism is a causative agent in promoting sex differences in stress, rather than addressing the possibility that the increased Neuroticism may reflect additional problems women are facing?
The correct thing to claim is “We should investigate what people are anxious/stressed about”. Jumping to conclusions that people’s states are simply a reflection of their innate traits is the problem.
I don’t think this is at the heart of Zack’s adventure? Zack’s issues were mainly about leading rationalists jumping in to rationalize things in the name of avoiding conflicts.
Anyway, making weighty claims about people is core to what differential psychology is about. It’s possible that some of my claims about Damore are false, in which case we should discuss that and fix the mistakes. However, the position that one should just keep quiet about claims about people simply because they are weighty would also seem to imply that we should keep quiet about claims about trans people and masculinity/femininity, or race and IQ, or, to make the Damore letter more relevant, men/women and various traits related to performance in tech.
Somewhat possible this is true. I think nerdy communities like LessWrong should do a better job at communicating the problems with various differential psychology findings and communicating how they are often made by conservatives to promote an agenda. If they did this, perhaps Damore would not have been in this situation.
But also, if I take my personal story as a template, then it’s probably more complicated than that. Yes, I had lots of time where I was a well-meaning nerd and I got tricked by conservative differential psychology. But a big part of the reason I got into this differential psychology in the first place was a distrust of feminism. If I had done my due diligence, or if nerdy groups had been better at communicating the problems with these areas, it might not have become as much of a problem.
I’ll address this first:
These things are bad, but, apart from point 2, I would ask: how do they compare to the average quality of social science research? Do you have high standards, or do you just have high standards for one group? I think most of us spend at least some time in environments where the incentive gradients point towards the latter. Beware isolated demands for rigor.
Research quality being what it is, I would recommend against giving absolute trust to anyone, even if they appear to have earned it. If there’s a result you really care about, it’s good to pick at least one study and dig into exactly what they did, and to see if there are other replications; and the prior probability of “fraud” probably shouldn’t go below 1%.
As for point 2—if you were a researcher with heretical opinions, determined to publish research on at least some of them, what would you do? It seems like a reasonable strategy is to pick something heretical that you’re confident you can defend, and do a rock-solid study on it, and brace for impact. Is it still the case that disproving the blank-slate hypothesis would constitute progress in some academic subfields? If so, then expect people to continue trying it.
Now, digging into the examples:
The study says there was “a meta-analysis concluding that small monetary incentives could improve test scores by 0.64 SDs” (roughly 10 IQ points); looks to be Duckworth et all 2011. The guy says it seemed sketchy—the studies had small N, weird conditions, and/or fraudulent researchers. Looking at table S1 from Duckworth, indeed, N is <100 on most of the studies; “Bruening and Zella (1978)” sticks out as having a large effect size and a large N, and, when I google for more info about that, I find that Bruening was convicted by an NIMH panel of scientific fraud. Checks out so far.
The guy ran a series of studies, the last of which offered incentives of nil, £2, and £5-£10 for test performance, with the smallest subgroup being N=150, taken from the adult population via “prolific academic”. He found that £2 and £5-£10 had similar effects, those being apparently 0.2 SD and 0.15 SD respectively, which would be 3 IQ points or a little less. (Were the “small monetary incentives” from Duckworth of that size? The Duckworth table shows most of the studies as being in the $1-$9 or <$1 range; looks like yes.) So, at least as a “We suspected these results were bogus, tried to reproduce them, and got a much smaller effect size”, this seems all in order.
Now, you say:
Your analysis essentially proposes that, if there were some method of increasing effort by 3-4x as much as he managed to increase it, then maybe you could in fact increase IQ scores by 10 points. This assumes that the effort-to-performance causation would stay constant as you step outside the tested range. That’s possible, but… I’m quite confident there’s a limit to how much “effort” can increase your results on a timed multiple-choice test, that you’ll hit diminishing marginal returns at some point (probably even negative marginal returns, if the incentive is strong enough to make many test-takers nervous), and extrapolating 3-4x outside the achieved effect seems dubious. (I also note that the 1x effect here means increasing your self-evaluated effort from 4.13 to 4.28 on a scale that goes up to 5, so a 4x effect would mean going to 4.73, approaching the limits of the scale itself.)
You say, doing your analysis:
That is interesting… Though the correlation between test effort and test performance in the studies is given as 0.27 and 0.29 in different samples, so, noise notwithstanding, your effects are consistently larger by a decent margin. That would suggest that there’s something else going on than the simple causation.
The authors say:
That last sentence is odd, since there was in fact an increase in both test effort and test performance. Perhaps they’re equivocating between “low effect” and “no effect”? (Which is partly defensible in that the effect was not statistically significant in most of the studies they ran. I’d still count it as a mark against them.) The authors continue:
That is a pretty interesting piece of evidence for the “ability leads to self-reported effort” theory.
Overall… The study seems to be a good one: doing a large replication study on prior claims. The presentation of it… The author on Twitter said “testing over N= 4,000 people”, which is maybe what you get if you add up the N from all the different studies, but each study is considerably smaller; I found that somewhat misleading, but suspect that’s a common thing when authors report multiple studies at once. On Twitter he says “We conclude that effort has unequivocally small effects”, which omits caveats like “our results are accurate to the degree that alternative incentives do not yield appreciably larger effects” which are in the paper; this also seems like par for the course for science journalism (not to mention Twitter discourse). And they seem to have equivocated in places between “low effect” and “no effect”. (Which I suspect is also not rare, unfortunately.)
Now. You presented this as:
The “focused on providing a counternarrative” part is plausibly correct. However, the “uses methods which are heavily downwards biased to “prove” [...]” is not. The “downwards biased methods” are “offering a monetary incentive of £2-£10, which turned out to be insufficient to change effort much”. The authors were doing a replication of Duckworth, in which most of the cited studies had a monetary incentive of <$10—so that part is correctly matched—and they used high enough N that Duckworth’s claimed effect size should have shown up easily. They also preregistered the first of their incentive-based studies (with the £2 incentive), and the later ones were the same but with increased sample size, then increased incentive. In other words, they did exactly what they should have done in a replication. To claim that they chose downwards-biased methods for the purpose of proving their point seems quite unfair; those methods were chosen by Duckworth.
This seems to be a data point of the form “your priors led you to assume bad faith (without having looked deeply enough to discover this was unjustified), which then led you to take this as a case to justify those priors for future cases”. (We will see more of these later.) Clearly this could be a self-reinforcing loop that, over time, could lead one’s priors very far astray. I would hope anyone who posts here would recognize the danger of such a trap.
Second example. “Simon Baron-Cohen playing Motte-Bailey with the “extreme male brain” theory of autism.” Let’s see… It seems uncontroversial (among the participants in this discussion) that there are dimensions on which male and female brains differ (on average), and on which autists are (on average) skewed towards the male side, and that this includes the empathizing and systematizing dimensions.
You quote Baron-Cohen as saying “According to the ‘extreme male brain’ theory of autism, people with autism or AS should always fall in the [extreme systematizing range]”, and say that this is obviously false, since there exist autists who are not extreme systematizers—citing a later study coauthored by Baron-Cohen himself, which puts only ~10% of autists into the “Extreme Type S” category. You say he’s engaging in a motte-and-bailey.
After some reading, this looks to me like a case of “All models are wrong, but some are useful.” The same study says “Finally, we demonstrate that D-scores (difference between EQ and SQ) account for 19 times more of the variance in autistic traits (43%) than do other demographic variables including sex. Our results provide robust evidence in support of both the E-S and EMB theories.” So, clearly he’s aware that 57% of the variance is not explained by empathizing-systematizing. I think it would be reasonable to cast him as saying “We know this theory is not exactly correct, but it makes some correct predictions.” Indeed, he counts the predictions made by these theories:
Note also that he states the definition of EMB theory as saying “autistic individuals are on average shifted toward a more “masculine” brain type”. You say “Sometimes EMB proponents say that this isn’t really what the EMB theory says. Instead, they make up some weaker predictions, that the theory merely asserts differences “on average”.” This is Baron-Cohen himself defining it that way.
Would it be better if he used a word other than “theory”? “Model”? You somewhat facetiously propose “If the EMB theory had instead been named the “sometimes autistic people are kinda nerdy” theory, then it would be a lot more justified by the evidence”. How about, say, the theory that “There are processes that masculinize the brain in males; and some of those processes going into overdrive is a thing that causes autism”? (Which was part of the original paper: “What causes this shift remains unclear, but candidate factors include both genetic differences and prenatal testosterone.”) That is, in fact, approximately what I found when I googled for people talking about the EMB theory—and note that the article is critical of the theory:
That is the sense in which an autistic brain is, hypothetically, an “extreme male brain”. I guess “extremely masculinized brain” would be a bit more descriptive to someone who doesn’t know the context.
The problem with a motte-and-bailey is that someone gets to go around advancing an extreme position, and then, when challenged by someone who would disprove it, he avoids the consequences by claiming he never said that, he only meant the mundane position. According to you, the bailey is “they want to talk big about how empathizing-systematizing is the explanation for autism”. According to the paper, it was 43% of the explanation for autism, and the biggest individual factor? Seems pretty good.
Has Baron-Cohen gone around convincing people that empathizing-systematizing is the only factor involved in autism? I suspect that he doesn’t believe it, he didn’t mean to claim it, almost no one (except you) understood him as claiming it, and pretty much no one believes it. Maybe he picked a suboptimal name, which lent itself to misinterpretation. Do you have examples of Baron-Cohen making claims of that kind, which aren’t explainable as him taking the “This theory is not exactly correct, but it makes useful predictions” approach?
The context here is explaining why you’ve “become horrified at what [you] once trusted”, which you now call “supposed science”. I’m… underwhelmed by what I’ve seen.
Back to Damore...
...I thought it was overkill to cite four quotes on that issue, but apparently not. Such priors!
It seems you’re saying Damore mentions A but not B, and B is bigger, therefore Damore’s “comprehensive” writeup is not so, and this omission is possibly ill-motivated. But, erm, Damore does mention B, twice:
This suggests that casting aspersions on Damore’s motives is not gated by “Maybe I should double-check what he said to see if this is unfair”.I think the anxiety/stress thing is more relevant for top executive roles than for engineer roles; a population-level difference is more important at the extremes. Damore does talk about leadership specifically:
Next:
Well, he lists one source of stress above, and he does recommend to “Make tech and leadership less stressful”.
And why would these rationalists care so much about avoiding these conflicts, to the point of compromising the intellectual integrity that seems so dear to them? Fear that they’d face the kind of hostility and career-ruining accusations directed at Damore, and things downstream of fears like that, seems like a top candidate explanation.
Um. Accusations are things you make about individuals, occasionally organizations. I hope that the majority of differential psychology papers don’t consist of “Bob Jones has done XYZ bad thing”.
You are equivocating between reckless claims of misconduct / malice by an individual, and heavily cited claims about population-level averages that are meant to inform company policy. Are you seriously stating an ethical principle that anyone who makes the latter should expect to face the former and it’s justified?
I think Damore was aware that there are people who use population-level differences to justify discriminating against individuals, and that’s why he took pains to disavow that. As for “the problems with various differential psychology findings”—do you think that some substantial fraction, say at least 20%, of the findings he cited were false?
Quick update!
I found that OpenPsychometrics has a dataset for the EQ/SQ tests. Unfortunately, there seems to be a problem for the data with the EQ items, but I just ran a factor analysis for the SQ items to take a closer look at your claims here.
There appeared to be 3 or 4 factors underlying the correlations on the SQ test, which I’d roughly call “Technical interests”, “Nature interests”, “Social difficulties” and “Jockyness”. I grabbed the top loading items for each of the factors, and got this correlation matrix:
The correlations between the technical interests and nature interests plausibly reflects the notion that Systematizing is a thing, though I suspect that it could also be found to correlate with all sorts of other things that would not be considered Systematizing? Like non-Systematizing ways of interacting with nature. Idk though.
The sex differences in the items was limited to the technical interests, rather than than also covering the nature interests. This does not fit a simple model of a sex difference in general Systematizing, but it does fit a model where the items are biased towards men but there is not much sex difference in general Systematizing.
I would be inclined to think that the Social difficulties items correlate negatively with Empathizing Quotient or positively with Autism Spectrum Quotient. If we are interested in the correlations between general Systematizing and these other factors, then this could bias the comparisons. On the other hand, the Social difficulties items were not very strongly correlated with the overall SQ score, so maybe not.
I can’t immediately think of any comments for the Jockyness items.
Overall, I strongly respect the fact that he made many of the items very concrete, but I now also feel like I have proven that the gender differences on Systematizing to be driven by psychometric shenanigans, and I strongly expect to find that many of the other associations are also driven by psychometric shenanigans.
I’ve sent an email asking OpenPsychometrics to export the Empathizing Quotient items too. If he does so, I hope to write a top-level post explaining my issues with the psychometrics here.
Hm, actually I semi-retract this; the OpenPsychometrics data seems to be based on the original Systematizing Quotient, whereas there seems to be a newer one called Systematizing Quotient-Revised, which is supposedly more gender-neutral. Not sure where I can get data on this, though. Will go looking.
Edit: Like I am still pretty suspicious about the SQ-R. I just don’t have explicit proof that it is flawed.
Am I gonna have to collect the data myself? I might have to collect the data myself...
Oops, upon reading more about the SQ, I should correct myself:
Some of the items, such as S16, are “filler items” which are not counted as part of the score; these are disproportionately part of the “Social difficulties” and “Jockyness” factors, so that probably reduces the amount of bias that can be introduced by those items, and it also also explains why they don’t correlate very much with the overall SQ scores.
But some of the items for these factors, such as S31, are not filler items, and instead get counted for the test, presumably because they have cross-loadings on the Systematizing factor. So the induced bias is probably not zero.
If I get the data from OpenPsychometrics, I will investigate in more detail.
Since I don’t have data on the EQ, here’s a study where someone else worked with it. They found that the EQ had three factors, which they named “Cognitive Empathy”, “Emotional Empathy” and “Social Skills”. The male-female difference was driven by “Emotional Empathy” (d=1), whereas the autistic-allistic difference was driven by “Social Skills” (d=1.3). The converse differences were much smaller, 0.24 and 0.66. As such, it seems likely that the EQ lumps together two different kinds of “empathizing”, one of which is feminine and one of which is allistic.
I should also say, in the context of IQ and effort, some of the true dispute is about whether effort differences can explain race differences in scores. And for that purpose, what I would do is to go more directly into that.
In fact, I have done so. Quoting some discussion I had on Discord:
(This was way after I became critical of differential psychology btw. Around 2 months ago.)
I don’t know for sure as I am only familiar with certain subsets of social science, but a lot of it is in fact bad. I also often criticize normal social science, but in this context it was this specific area of social science that came up.
I would try to perform studies that yield much more detailed information. For instance, mixed qualitative and quantitative studies where one qualitatively inspects the data points that are above-average or below-average for the regressions, to see whether there are identifiable missing factors.
If he had phrased his results purely as disproving the importance of incentives, rather than effort, I think it would have been fine.
I prefer to think of it as “if you increase your effort from being one of the lowest-effort people to being one of the highest-effort people, you can increase your IQ score by 17 IQ points”. This doesn’t seem too implausible to me, though admittedly I’m not 100% sure what the lowest-effort people are doing.
It’s valid to say that extrapolating outside of the tested range is dubious, but IMO this means that the study design is bad.
I think it’s likely that the limited returns to effort would be reflected in the limited bounds of the scale. So I don’t think my position is in tension with the intuition that there’s limits on what effort can do for you. Under this model, it is also worth noting that the effort scores were negatively skewed, so this implies that lack of effort is a bigger cause of low scores than extraordinary effort is of high scores.
I don’t think my results are statistically significantly different from 0.3ish; in the ensuing discussion, people pointed out that the IV results had huge error bounds (because the original study was only barely significant).
But also if there is measurement error in the instrument (effort), then that would induce an upwards bias in the IV estimated effect. So that might also contribute.
Shitty replications of shitty environmentalist research is still shitty.
Like this sort of thing makes sense to do as a personal dispute between the researchers, but for all of us who’d hope to actually use or build on the research for substantial purposes, it’s no good if the researchers use shitty methods because they are trying to build a counternarrative against other researchers using shitty methods.
I wouldn’t confidently disagree with this, but I do have some philosophical nitpicks/uncertainties.
(“Brain” connotes neurology to me, yet I am not sure if empathizing and especially systematizing are meaningful variables on a neurological level. I would also need to double-check whether EQ/SQ are MI for sex and autism because I don’t remember whether they are. I suspect in particular the EQ is not, and it is the biggest drive of the EQ/SQ-autism connection, so it is pretty important to consider. But for the purposes of the Motte-Bailey situation, we can ignore that. Just tagging it as a potential area of disagreement.)
I think what would be better would be if he clarified his models and reasoning. (Not positions, as that opens up the whole Motte-Bailey thing and also is kind of hard to engage with.) What is up with the original claim about autists always being extreme type S? Was this just a mistake that he would like to retract? If he only considers it to be a contributor that leads to half the variance, does he have any opinion on the nature of the other contributors to autism? Does he have any position on the relationship between autistic traits as measured by the AQ, and autism diagnosis? What should we make of the genetic contributors to autism being basically unrelated to the EQ/SQ? (And if the EQ/SQ are not MI for sex/autism, what does he make of that?)
This is part of the trouble, these areas do not have proper discussions.
No, I meant that under your interpretation, Damore mentions A when A is of negligible effect, and so that indicates a mistake. I didn’t mean to imply that he didn’t mention B, and I read this part of his memo multiple times prior to sending my original comment, so I was fully aware that he mentioned B.
But again the “Make tech and leadership less stressful” point boiled down to medicalizing it.
Valid point.
Differential psychology papers tend to propose ways to measure traits that they consider important, to extend preciously created measures with new claims of importance, and to rank demographics by importance.
I think in an ideal world, the research and the discourse would be more rational. For people who are willing to discuss and think about these matters rationally, it seems inappropriate to accuse them of misconduct/malice simply for agreeing with them. However if people have spent a long time trying to bring up rational discussion and failed, then it is reasonable for these people to assume misconduct/malice.
Using population-level differences to justify discriminating against individuals can be fine and is not what I have been objecting to.
I don’t know. My problem with this sort of research typically isn’t that it is wrong (though it sometimes may be) but instead that it is of limited informative value.
I should probably do a top-level review post where I dig through all his cites to look at which parts of his memo are unjustified and which parts are wrong. I’ll tag you if I do that.
I mean, I agree that this is obviously a thing, but I continue to maintain hope in the possibility of actually reasoning about sex differences in the physical universe, rather than being resigned to living in the world of warring narratives. I think the named effect sizes help? (I try to be clear about saying “d ≈ 0.6”, not “Men are from mars.”)
I absolutely agree that it’s critical to recognize that it’s often both. (I’m less sure about how often it’s “neither”. What stops people from converging?)
I agree.
I think also a lot of it is just down to doing better research, including better psychometrics and more qualitative investigations.
But what I mean is that the research programs such as people-things and similar that I have seen so far are not good enough, and are not attempting to become good enough sufficiently well that you can expect to just fund them and wait for results.
Not because it is fundamentally impossible, but because the challenges are not taken seriously enough.
I think if due to priors, the sides have reasons to distrust each other, but the distrust leads to ignoring each other instead of leading to conflict, both sides can remain honest and rational (rather than degenerating into dishonesty due to conflict) while not realizing that the other sides are honest and rational, and so end up not converging?
I don’t get what point you’re making here.
It’s an example of a trans activist who, when asked whether people who want to coordinate sex-based descrimination should be purged from the discourse and authoritative sources, was like “yeah they just sound like mean busybodies to me”. Admittedly I didn’t really go into detail (partly because I don’t really have any strong examples that I support and want to argue about), so we don’t really know whether Pervocracy supports it in all relevant cases.