Why Are The Human Sciences Hard? Two New Hypotheses

Aydin Mohseni, Daniel Herrmann and ben_levinstein

Mar 18, 2025, 3:45 PM

34 points

World Modeling Epistemology Replication Crisis Practice & Philosophy of Science

“Reasoning about the relative hardness of sciences is itself hard.”
—the B.A.D. philosophers

Epistemic status: Conjecture. Under a suitable specification of the problem, we have credence ~50% on the disjunction of our hypotheses explaining >1% of the variance (e.g., in $R^{2}$ values) between disciplines, and $≪$ 1% on our hypotheses explaining >50% of such variance.

The Puzzle: A Tale of Two Predictions

Imagine two scientific predictions:

Prediction A: Astronomers calculate the trajectory of Comet NEOWISE as it approaches Earth, predicting its exact position in the night sky months in advance. When the date arrives, there it is—precisely where they said it would be.

Prediction B: Political scientists forecast the outcome of an election, using sophisticated models built on polling data, demographic trends, and historical patterns. When the votes are counted, the results diverge wildly from many predictions.

Why does science seem to struggle so much more with predicting human behavior than with predicting physical phenomena? This gap in predictive performance between the human sciences and natural sciences is widely acknowledged—yet explanations for it often feel unsatisfying.

“Because they deal with systems that are highly complex, adaptive and not rigorously rule-bound, the human sciences are among the most difficult of disciplines, both methodologically and intellectually...”
—Drezner (2012) “A Different Agenda,” Nature, p. 271.

The usual explanation is that human behavior is simply more complex than physical systems. But is that the whole story? We don’t think so.

In this post, we present two novel hypotheses for why the human sciences appear harder than other sciences—hypotheses that focus not on the inherent complexity of the subject matter, but on structural features of scientific inquiry itself. We don’t claim these hypotheses explain the entire difficulty gap, only that they likely explain some fraction of the difference between disciplines:

Rigid Demands Hypothesis (RD): In the human sciences, we are pre-committed to a specific prediction tasks to a greater degree than in the physical sciences, limiting a powerful strategy for scientific progress—changing the question to something more tractable.
Fruit in the Hand Hypothesis (FTH): Due to evolutionary pressures, humans already have relatively high baseline performance for many “low-hanging fruit” prediction tasks concerning human behavior—more so than for many physical domains—making progress beyond this baseline comparatively more challenging.

This post is based on a paper by Daniel Herrmann, Aydin Mohseni, and Gabe Orona Avakian. All proofs can be found in the mathematical appendix of that paper.

The Typical Explanations: Complexity, Methods, and Incentives

Before introducing our new hypotheses, let’s briefly survey the existing explanations for why the human sciences are hard. Most fall into three categories:

Problems in Subject Matter

The most common explanation is that human behavior is inherently more complex than physical systems:

Social phenomena have more variables and feedback loops
Human systems are context-dependent and culturally influenced
People respond to predictions about them, creating self-fulfilling or self-defeating prophecies
Individual differences and variation make generalization difficult^[1]

Problems in Methods

Others point to methodological limitations:

Lack of unified theoretical frameworks
Difficulty conducting controlled experiments
Statistical challenges and questionable research practices
Vulnerability to bias on politically charged topics^[2]

Problems in Incentives

Still others look to institutional factors:

Insufficient funding compared to natural sciences
Publication bias toward surprising or counterintuitive findings
Perverse incentives in the absence of easy verification^[3]

These explanations tend to focus on specific, local features of the human sciences. Our approach differs by highlighting general, structural features of scientific inquiry itself.^[4]

Our Approach: Prediction Tasks as the Unit of Analysis

Before we can analyze why some sciences might be harder than others, we need to clarify what “hardness” even means in this context. This is not a trivial task—it requires making substantive choices about how to operationalize and measure scientific difficulty.

When we say “the human sciences are hard” what do we mean exactly? We’re not claiming they require more intellect or effort. Rather, we’re suggesting something like:

In the human sciences, we tend to exhibit worse performance in prediction tasks of interest relative to the performance in prediction tasks of other sciences.

We have made several moves here:

We have made explicit the comparative nature of the hardness judgement.
We have cashed out hardness in terms of some sort of perceived performance.
We have chosen to operationalize performance in terms of prediction tasks.

In particular, we focus on prediction tasks because:

They provide a clear metric for success
Even explanation and understanding can be framed as forms of prediction
Prediction success or failure is relatively uncontroversial to measure

Alternative Choices: Unpacking Hardness

Clearly, science is not only about performance on prediction tasks: explanation, understanding, and guiding interventions, are just some of the ways that a science can produce good performance. (Even though we can think of prediction as subsuming some of these.)^[5]

Other plausible metrics for the (perceived) hardness of a science might include reproducibility rates, theoretical unification, or methodological sophistication, etc. That said, we think focusing on prediction tasks as the unit of performance provides valuable insights while remaining (relatively) straightforward to assess. And so, it’s where we start.

A Model for Understanding Disciplinary Difficulty

Now, let’s formalize our hypotheses with a simple model. This requires making explicit choices about how to represent scientific disciplines and their difficulty—choices that themselves illustrate how challenging it is to reason rigorously about disciplinary hardness:

Let $T$ be the set of all possible prediction tasks (e.g., predicting comet trajectories, drug effects, election outcomes)
$H : T \to R^{+}$ assigns a hardness value to each task (lower values mean “better performance”, i.e., being “less hard”)^[6]
A discipline $D$ is a subset of $T$ (e.g., physics, biology, psychology)
We judge disciplines by their most successful performances:

H (D) := min {h | \exists t \in D : h = H (t)}

This last point is crucial: we tend to evaluate fields by their greatest successes, not their average performance or their failures. Physics seems impressive because of its stunning bullseye predictions, not because it can solve every problem.

Alternative Choices: Unpacking Perceived Performance

Our model measures a discipline’s perceived performance by its most impressive achievement—essentially capturing the “best case” scenario. And again, this represents just one of several possible approaches to quantifying scientific success.

We could instead consider for each field:

Mean or median performance across all prediction tasks
The distribution of successes (e.g., variance or skewness)
Weighted averages favoring more socially important predictions
Average performance of the top $n$ most successful predictions
Average performance of the top $k %$ of predictions
etc.

Our focus on the most impressive achievements aligns with how people often judge disciplines psychologically—through exemplars and standout discoveries rather than modal performance. We would guess that our analysis is robust under choices of hardness metrics that take a generalized mean of a suitably small subset of the most successful performances in a field (e.g., considering the average of the top 5 predictions rather than just the single best one), but other choices of metrics might yield different conclusions about relative discipline difficulty.

Hypothesis 1: Rigid Demands

Core Intuition

Scientists don’t just discover facts—they choose which questions to ask. This choice profoundly affects how successful a discipline appears.

Vignette: The Freedom to Choose More Tractable Problems

Imagine two researchers:

Physicist Alice wants to understand gas behavior. She realizes predicting the motion of individual gas particles is nearly impossible. Instead, she shifts to studying the relationship between temperature, pressure, and volume—variables that emerge at scale and follow elegant mathematical laws. Her predictions are precise and widely applicable.

Education Researcher Bob wants to improve student outcomes. Policymakers demand: “Will this specific intervention increase future earnings?” Bob can’t redefine the question to something more tractable—the original question is exactly what matters for policy. He’s stuck with a complex prediction task whether he likes it or not.

In many natural sciences, researchers have substantial freedom to redefine their questions to make them more tractable. In the human sciences, particularly those with policy implications, researchers often face rigid demands to answer specific questions regardless of their tractability.

Formal Expression

We can express this hypothesis by considering the cardinality (size) of disciplines:

Proposition 1: Let the set of all task difficulties be IID and the distributions non-degenerate, and let $| D_{1} | < | D_{2} |$ . Then $E (H (D_{1})) > E (H (D_{2}))$ .

In plain language: Even if two disciplines have the same distribution of task difficulties, if one discipline has freedom to explore more tasks, it will likely achieve more impressive successes than the discipline with less freedom.^[7]

Real-World Example: Economic Pre-Commitments

Economics must predict inflation, recessions, and employment—these questions are non-negotiable because policymakers need answers to them. Economists can’t say, “Actually, we’ll study something more tractable instead.”

When entire institutions like central banks, governments, and financial markets depend on a yes/no answer to this specific question about interest rates or inflation, the discipline can’t pivot to easier or more general tasks. The stakes and external demands lock economists into particular prediction challenges, regardless of their tractability.

In contrast, physicists were not committed to discovering the periodic table, fields or quantum wave functions. Many of the great successes of physics are answers to question no one would think to ask just decades before they were discovered. The hard sciences were formed when frontiers of highly tractable and promising theorizing opened up.^[8]

The rigid demands placed on many human sciences limit a powerful strategy for scientific progress: changing the question to something more tractable.

Hypothesis 2: Fruit in the Hand

Core Intuition

Some prediction tasks are already solved for us by evolution and enculturation. We carry these solutions around without recognizing them as scientific achievements.

Vignette: Unimpressive Predictions

Imagine the following prediction task:

“If someone in a room full of quietly working philosophers suddenly screams and throws a glass of water at the wall, shattering it, what will happen?”

You can predict with extremely high accuracy that people will stop writing, look up startled, and show surprise or alarm. There will be a pause while they figure what is going on. If they think that no real threat is present, they will return to something like their previous activities.” This prediction is more accurate than many “scientific” predictions—yet we don’t consider it a scientific achievement because it seems obvious.

Why? Because evolution has already given us sophisticated cognitive machinery for predicting basic social reactions. The “low-hanging fruit” of human behavior prediction has already been picked by natural selection.

In the natural sciences, we have no evolutionary advantage in predicting quantum behavior or chemical reactions. The “easy wins” in these domains remain available for science to claim as achievements.

Formal Expression

We can express this hypothesis in two ways:

1. Removing Already-Solved Tasks

If we imagine removing the easiest prediction tasks (which evolution has already solved) from the human sciences:

Proposition 2: Let $D$ be a discipline with IID tasks, let $ℓ$ be the easiest task in $D$ , and let $t$ be a random task in $D$ . Then $E (H (D - ℓ)) \geq E (H (D - t))$ and $E (H (D - ℓ)) \geq E (H (D))$ .

In plain language: Removing the easiest tasks from a discipline makes it appear harder.

2. Accounting for Impressiveness

Alternatively, we can define a function $I_{D}$ that measures how impressed we are when a prediction task is solved:

U (t) := \frac{H (t)}{I_{D} (H (t))}

If we solve an “easy” social prediction task, we’re not impressed (low $I$ value) because our evolutionary intuitions already solved it. This makes the human sciences seem less impressive even when they achieve certain accurate predictions.^[9]

Illustration: Celestial Orbits vs Facial Expressions

As philosopher Jerry Fodor noted, psychology as a science must surpass “folk psychology” to be impressive. We already have intuitive theories about how minds work—we make predictions about others’ beliefs, desires, and behaviors constantly.

In physics, deriving a neat formula for planetary motion from first principles is mind-blowing because we had zero built-in intuition for elliptical orbits. Meanwhile, in daily life, we routinely predict sophisticated and nuanced human emotions, intentions, and behavior with remarkable precision—folk psychology handles that, so a formal study confirming the same generates little excitement. The “wow factor” is vastly different because one domain leverages evolved intuitions while the other operates entirely outside them.

To begin to develop a quantitative sense of this, consider that while the “impressive” task of predicting the trajectory of celestial bodies are characteristically satisfactorily addressed by a relatively small set of 2nd-order ODEs with $7 N$ parameters^[10], the comparatively “unimpressive” task of facial recognition has only recently been solved by neural networks requiring many millions of parameters.^[11]

Bringing It Together: Why These Hypotheses Matter

The big takeaway: You can’t simply observe that physics makes more accurate predictions than social science and conclude that social phenomena are inherently harder to predict.

Our model shows that:

If a discipline has relatively less flexibility in changing its research questions to more tractable ones (Rigid Demands), it will appear less successful even if its domain isn’t inherently more complex.

If the easy prediction tasks in a domain have already been solved by our evolved cognition (Fruit in the Hand), we’ll only notice the difficult remaining tasks, making the domain seem harder still.

Limitations and Extensions of Our Hypotheses

Our hypotheses open new perspectives on scientific difficulty, but it’s important to clarify their scope and implications.

Complementing Rather Than Contradicting Traditional Explanations

We recognize that human behavior may indeed be more complex than physical systems. Our hypotheses don’t contradict these complexity-based explanations. Rather, we suggest that even if the underlying tasks in human sciences were equally difficult (which they may not be), the structure of scientific inquiry would still make them appear harder due to the mechanisms we’ve described.

Scope and Magnitude of Our Explanation

To be clear about the explanatory power we’re claiming: these hypotheses explain only part of the judgment that human sciences are difficult. We expect that the distributions of task difficulties do genuinely vary across disciplines, likely in ways that others have proposed.

To put some tentative numbers on this: under a suitable precise formulation, we might place ~50% credence on our hypotheses explaining more than 1% of the variance (e.g., in $R^{2}$ values) between disciplines, but substantially less than 1% credence on them explaining more than 50% of such variance. Our primary contribution is demonstrating how structural features of inquiry can create or amplify apparent differences in difficulty, not claiming they explain all or even most of the difference.

Clarifying Folk Psychology’s Role

The existence of sophisticated folk psychology doesn’t mean that predicting human behavior is inherently easier. Quite the opposite: it suggests that the prediction tasks we assign to formal social science are precisely those where our evolved intuitions fail. The easy tasks have already been “picked” by evolution and excluded from what we consider science, leaving mainly the difficult ones.

Testing These Hypotheses Empirically

These hypotheses generate testable predictions. We could compare prediction tasks across disciplines, measuring both their intrinsic difficulty and how constrained researchers are in defining them. Surveys of scientists and laypeople could examine whether the best predictions in newer domains (where we lack evolutionary intuitions) appear more impressive relative to their actual difficulty. Such investigations would help determine the extent to which our proposed mechanisms contribute to the perceived difficulty gap between sciences.

Takeaways

Reasoning about the relative hardness of sciences is itself hard, and making this reasoning formal reveals substantive choices that have to be made in order to specify the content of claims regarding “hardness.”
The social sciences may appear harder than natural sciences partly due to structural features of inquiry, not just inherent complexity.
The Rigid Demands Hypothesis suggests that the social sciences may, on average, have less freedom to pursue more tractable questions than natural sciences, and this should impact our expectation of their success.
The Fruit in the Hand Hypothesis suggests that evolution has already given us solutions to many easy human behavior prediction tasks, leaving more difficult ones, on average, for the social sciences.
Our formal model shows how these factors can create the appearance of different levels of difficulty across disciplines, even if (hypothetically) the underlying distribution of task difficulties were precisely the same.

^
See, for examples: Eronen & Bringmann (2021) “The Theory Crisis in Psychology: How to Move Forward”; Drezner (2012) “A different agenda”; Finkelstein (2005) “Problems of measurement in soft systems”; and Machlup (1961) “Are the Social Sciences Really Inferior?”
^
See, for examples: van Rooij & Baggio (2020) “Theory development requires an epistemological sea change”; Muthukrishna & Henrich (2019) “A problem in theory”; Smaldino (2019) Better methods can’t make up for mediocre theory; and Meehl (1978) “Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology.”
^
See, for examples: Stephan (2012) “Perverse incentives”; Stern & Feller (2007) “A Strategy for Assessing Science: Behavioral and Social Research on Aging”; Andreski (1974) “Social Sciences as Sorcery.”
^
Our favorite entry in this genre is undoubtedly Meehl’s (1978) “Theoretical Risks and Tabular Asterisks,” in which he wryly remarks (p. 807): “since (in 10 minutes of superficial thought) I easily came up with 20 features that make human psychology hard to scientize, I invite you to pick your own favorites.”
^
One might reasonably disagree here. For opposing views, see Salmon (1989) “Four Decades of Scientific Explanation” and Strevens (2013) “Scientific Explanation”.
^
One could use the coefficient of determination, $R^{2}$ , as an operationalization of our hardness metric, $H$ . With it, we can capture typical difference in $R^{2}$ values across scientific domains. Ozili et al (2022, p.2) succintly illustrate this variation:
Typically, statisticians and scientists in the pure sciences will dismiss a model as “weak”, “unreliable” and “lacking a predictive power if the reported R-square of the model is below 0.6.”
By contrast, they note that in the social sciences:
…a low R-square of at least 0.1 is acceptable on the condition that some or most of the predictors or explanatory variables are statistically significant.
A $R^{2}$ of .5 is a physicist’s embarrassment—but a social scientist’s triumph.
^
Actually, freedom to redefine ones problem just one way to increase the cardinality of the set of tasks. Other ways to realize this include: having a larger domain of possible tasks, or to having spent longer exploring tasks. Increasing the cardinality of the set of tasks undertaken increases the expected value of one’s greatest successes.
^
To wit, a colleague once memorably quipped, “Physics is what we would have called whichever theory had been most successful.”
^
Contra all the hate it gets, game theory represents a significant achievement precisely because it can outperform folk psychology in predicting strategic behavior. Its empirically validated predictions—like the chain‑store paradox, the winner’s curse, the traveler’s dilemma, or the volunteer’s dilemma—sometimes go beyond our intuitive notions of rational behavior.
^
Where $6 N$ comes from the initial conditions (positions + velocities) and $N$ from the masses.
^
In particular, VGGNeT and FaceNet—the first CNNs that became known for accuracy in facial recognition tasks—had just over 100 million parameters.

Aydin Mohseni, Daniel Herrmann and ben_levinstein

Mar 18, 2025, 3:45 PM

34 points

12 comments9 min readLW link

World Modeling Epistemology Replication Crisis Practice & Philosophy of Science

Kenny2 Mar 18, 2025, 8:59 PM
9 points
4

This is nice. The “Fruit in the Hand” part also seems to me to line up nicely with the claim that the social sciences deal with more complex questions—if we have more innate faculties for making predictions about social matters than about physical ones, then the questions we want to ask the social scientists start from a higher baseline than the questions we want to ask the physical scientists.
One type of traditional explanation that I’m not sure if you mentioned is the problem of reflexivity—theorizing about society creates a new object (the theory) that influences the society, while theorizing about the physical world doesn’t usually create a new object that itself influences the physical world. Especially in certain kinds of market and political interactions, where people are incentivized to frustrate predictions (or occasionally incentivized to realize them), these theories can influence things in harder-to-predict ways than in physical interactions, which usually don’t operate according to incentives at all, and thus not on incentives for frustrating or realizing predictions.
- Aydin Mohseni Mar 20, 2025, 6:56 PM
  3 points
  1
  Parent
  
  Thanks! Yeah! That’s just how we are thinking about it.
  I like that observation, and it sounds just. Marxism, Keynesian economics, and various psychotherapeutic paradigms provide striking examples of theories that substantively influence the behavior they are meant to describe. And, as you say, the nature of their influence can be subtle and multifaceted—ranging from informing people’s expectations of others’ behavior and their own behavior to providing Schelling points for social coordination and introducing possible actions and strategies not previously salient or even imagined.
  
  The best reference I know for a discussion of something like this is by the sociologist of science, Robert Merton in his 1948 “The Self-Fulling Prophecy.” In it, he considers mechanisms by which economic, psychological, and sociological theories become self-reinforcing or self-negating.
johnswentworth Mar 19, 2025, 4:21 PM
8 points
0

The usual explanation is that human behavior is simply more complex than physical systems.
Maybe that’s the usual explanation among people who are trying not to offend anyone. The explanation I’d jump to if not particularly trying to avoid offending anyone is that social scientists are typically just stupider than physical scientists (economists excepted). And that is backed up by the data IIUC, e.g. here’s the result of a quick google search:
- Alexander Gietelink Oldenziel Mar 20, 2025, 7:24 AM
  11 points
  4
  Parent
  
  Social sciences suffer from social drsirability bias, very noisy data, difficult to formalize concepts/abstractions too leaky, and difficulties with controls. Additionally, many people have strong (and often false) intuitions about social reality overriding scientific judgement.
  One could call the social world more “complex” than physical realm but “complex” is a tricky word… A human is arguably more complex than an atom; yet the models that social scientists use are much less complex than used by physicists.
  Although I wouldn’t dispute the stats that you are citing here, John, I would guess these might be downstream from above difficulties.
  - johnswentworth Mar 20, 2025, 4:45 PM
    6 points
    2
    Parent
    
    Although I wouldn’t dispute the stats that you are citing here, John, I would guess these might be downstream from above difficulties.
    I think that’s the right counterargument to make (kudos :) ). Building on it, I’d say: ok, the causal arrows go both ways here. The field draws less impressive people, who do kinda junk research, which draws less impressive people. So the different sciences end up in different equilibria, with different levels of competence among the scientists. But then, what determines the equilibrium? Why did physics end up with more competent people doing more competent work, and psychology with less competent people doing less competent work, rather than vice-versa? Unless we want to claim that it was luck of the draw, there has to be something endogenous to the underlying territories of physics and psychology which cause their sciences to end up in these different equilibria.
    Once the question is framed that way, it suggests different answers.
    One could maybe tell a story like “psychology is more complex than physics, so physics had more impressive work earlier on, which drew in smarter people...”. That could explain the different equilibria. But also, it feels like a just-so story; one could just as easily argue that smarter people tend to be drawn to more complex systems, where they can properly show off their merit.
    Explanations like noisy data or difficulties with controls seem similar. Like, any explanation of the form “psychology is harder for reason X, so physics had more impressive work earlier on, which drew in smarter people...” feels like a just-so story; it seems at least as plausible that more competent people will be drawn to more difficult problems. (Note that both the hypotheses put forward in the OP are of this form, so this is also a response to the OP.)
    Feedback loops look a little more plausible as an explanation: “insofar as it’s harder to get clear feedback on models in psychology compared to physics, it’s easier for bullshit to thrive in psychology, so smarter people go to physics where success is relatively more a function of smarts rather than academic politics, …”. I don’t really buy that explanation, since getting feedback on models seems-to-me comparably difficult in physics and in psychology; it’s a core hard part of any science. But it’s at least plausible.
    The infiltration of “social reality” is another plausible explanation, which I personally find much more probable. The model would be roughly: “insofar as psychology is largely about modeling social reality and adjacent topics, psychology itself becomes a relevant battleground for social-reality-level competitions, and also people predisposed to thinking in social reality will find psychology more appealing than physics. Alas, both focus on social-reality-level competitions and predisposition for thinking in social reality are extremely strong negative predictors of one’s competence as a scientist, so psychology ends up with a lot more incompetent people than physics, …”. I probably wouldn’t endorse that exact model as worded, but I’d put a fair bit of probability on something in roughly that ballpark.
    - Lucius Bushnaq Mar 22, 2025, 8:51 AM
      13 points
      2
      Parent
      
      This comment by Carl Feynman has a very crisp formulation of the main problem as I see it.
      They’re measuring a noisy phenomenon, yes, but that’s only half the problem. The other half of the problem is that society demands answers. New psychology results are a matter of considerable public interest and you can become rich and famous from them. In the gap between the difficulty of supply and the massive demand grows a culture of fakery. The same is true of nutrition— everyone wants to know what the healthy thing to eat is, and the fact that our current methods are incapable of discerning this is no obstacle to people who claim to know.
      For a counterexample, look at the field of planetary science. Scanty evidence dribbles in from occasional spacecraft missions and telescopic observations, but the field is intellectually sound because public attention doesn’t rest on the outcome.
      So, the recipe for making a broken science you can’t trust is
      The public cares a lot about answers to questions that fall within the science’s domain.
      The science currently has no good attack angles on those questions.
      As you say, if a field is exposed to these incentives for a while, you get additional downstream problems like all the competent scientist who care about actual progress leaving. But I think that’s a secondary effect. If you replaced all the psychology grads with physics and electrical engineering grads overnight, I’d expect you’d at best get a very brief period of improvement before the incentive gradient brought the field back to the status quo. On the other hand, if the incentives suddenly changed, I think reforming the field might become possible.
      This suggests that if you wanted to found new parallel fields of nutrition, psychology etc. you could trust, you should consider:
      Making it rare for journalists to report on your new fields. Maybe there’s just a cultural norm against talking to the press and publishing on Twitter. Maybe people have to sign contracts about it if they want to get grants. Maybe the research is outright siloed because it is happening inside some company.
      Finding funders who won’t demand answers if answers can’t be had. Seems hard. This might exclude most companies. The usual alternative is government&charity, but those tend to care too much about what the findings are. My model of how STEM manages to get useful funding out of them is that funding STEM is high-status, but STEM results are mostly too boring and removed from the public interest for the funders to get invested in them.
      - Garrett Baker Mar 22, 2025, 9:21 AM
        10 points
        6
        Parent
        
        
        So, the recipe for making a broken science you can’t trust is
        
        The public cares a lot about answers to questions that fall within the science’s domain.
        The science currently has no good attack angles on those questions.
        
        To return to LessWrong’s favorite topic, this doesn’t bode well for alignment.
    - Aydin Mohseni Mar 22, 2025, 4:30 AM
      8 points
      2
      Parent
      
      Like, any explanation of the form “psychology is harder for reason X, so physics had more impressive work earlier on, which drew in smarter people...” feels like a just-so story; it seems at least as plausible that more competent people will be drawn to more difficult problems. (Note that both the hypotheses put forward in the OP are of this form, so this is also a response to the OP.)
      Thanks, John. I want to clarify how our hypotheses differ from the “just-so story” pattern you described.
      Our hypotheses don’t claim “psychology is harder for reason X, which led to physics attracting smarter people.” Rather, we propose structural factors that may contribute to the *perception* of different disciplines’ difficulty, independent of the researchers’ capabilities.
      I share your skepticism of just-so stories, which typically:
      Highlight compatibility between evidence and a hypothesis
      Fail to consider alternative explanations
      Don’t generate novel, testable predictions
      Often lack explicit calibration about confidence
      We’ve tried to avoid these pitfalls in several ways:
      First, we’re explicit about our confidence levels. We present these as partial explanations among many factors that likely contribute to perceived disciplinary difficulty, not as comprehensive accounts.
      Second, our hypotheses can generate testable predictions. For example:
      The Rigid Demands hypothesis makes it more likely that self-reported pre-commitment to specific questions should correlate with lower R² values across fields
      The Fruit in the Hand hypothesis makes it more likely that the something like the Kolmogorov complexity of algorithms that solve “impressive” tasks in evolved social domains (like facial recognition) should be greater than those for non-evolved physical domains (like calculating rocket trajectories)
      Third, we’ve formalized our reasoning mathematically, which helps expose assumptions and clarify the scope of our claims.
      Clearly, these are speculative conjectures regarding very hard questions. That said, we think our formal approach can help move us toward more structured hypotheses about a fascinating question—something we hope is marginally better than just-so stories. :)
    - Garrett Baker Mar 20, 2025, 7:47 PM
      2 points
      1
      Parent
      
      My model is that early on physics had very impressive & novel math, which attracted people who like math, who did more math largely with the constraint the math had to be trying to model something in the real world, which produced more impressive & novel math, which attracted more people who like math, etc etc, and this is the origin of the equilibrium.
      
      Note a similar argument can be made for economics, though the nice math came much later on, and obviously was much less impactful than literally inventing calculus.
Garrett Baker Mar 20, 2025, 4:17 AM
2 points
0

In contrast, physicists were not committed to discovering the periodic table, fields or quantum wave functions. Many of the great successes of physics are answers to question no one would think to ask just decades before they were discovered. The hard sciences were formed when frontiers of highly tractable and promising theorizing opened up.

This seems a crazy comparison to make^[1]. These seem like methodological constraints. Are there any actual predictions past physics was trying to make which we still can’t make and don’t even care about? None that I can think of.
1. ↩︎
  Since ancient greece people were trying to break things down into their elements, though of course they called it “stoikheion”, which literally means “One of a row”. Now of course, they were wrong, it turns out the stoicheia ought to be arranged in a table not a row. But in either case the idea was there. “We can break things down into their elemental units and those things will have definite interaction properties we can use to understand all substances”.
- Aydin Mohseni Mar 22, 2025, 3:28 AM
  13 points
  5
  Parent
  
  This seems a crazy comparison to make.
  Perhaps. I appreciate the prompt to think more about this.
  Here’s a picture that underpins our perspective and might provide a crux:
  The world is full of hard questions. Under some suitable measure over questions, we might find that almost all questions are too hard to answer. Many large scale systems exhibit chaotic behavior, most parts of the universe are unreachable to us, most nonlinear systems of differential equations have no analytic solution. Some prediction tasks are theoretically unsolvable (e.g., an analytic solution to the three-body problem), some are just practically unsolvable given current technology (e.g., knowledge of sufficiently far off domains in our forward light cone).
  Are there any actual predictions past physics was trying to make which we still can’t make and don’t even care about? None that I can think of.
  Here are a few prediction tasks regarding the physical world that we have not answered. What is the exact direction of movement of a single atom in the next instant? How do we achieve nuclear fusion at room temperature? What constitutes a measurement in QM? Why do we observe electric charges but not magnetic monopoles? Why did the Pioneer 10 and 11 spacecraft experience an unexplained deceleration? What is the long-run behavior of any real-world chaotic system? Is the sound of knuckles cracking caused by the formation or collapse of air bubbles? How many grains of sand are there on a given beach on third planet on the fourth star of Alpha Centauri?
  But this list is deceptive in that it hides the vast number of problems we have not answered but don’t remember because we were not pre-committed to their solution—we simply picked them up only to realize they were too hard or unpromising and we put them back down again on the search for more tractable and promising problems.
  As you say, you can’t think of any problems in physics we couldn’t solve and no longer care about! In labs across the globe, PIs, post-docs, grad students, and researchers of all stripes are formulating vast numbers of questions, the majority of which are abandoned.
  Scientists in the hard sciences routinely dismiss equivocal test results as unsatisfactory and opt to pursue alternative lines of inquiry that promise unequivocal findings. In many cases, the strength of typical inferences even makes statistical analysis unnecessary. The biologist Pamela Reinagel captures this common attitude concisely in this talk: “If you needed to do a statistical test, you just did a bad experiment,” and “If you needed statistics, you are studying something so trifling it doesn’t matter.” What is happening is that we are redefining our problems until we find regularities that are sufficiently strong so as to be worth pursuing.
  In contrast, the education researcher and the development economist are stuck with only a few outcome variables they’re allowed to care about, and so have to dig through the crud of tiny $R^{2}$ values to find anything of interest. When I give you such a problem—explain the cause of market crashes or effective interventions to improve educational outcomes—you might just be out of luck for how much of it you can explain with a few crisp variables and elegant relations. The social scientist in such cases doesn’t get to scour the expanse of the space of regularities in physical systems until they find a promising vein of inquiry heretofore unimagined.
  Of course, no scientist is fully unrestricted in her capacity to redefine her problems, but we suspect that there are differences in degree here between the sciences that make a difference.
  Tell me if the following vignette is elucidating.
  You and your colleague are given the following tasks: you have 10 years in which to work on cancer come back with your greatest success, your colleague also has 10 years in which to work on cancer but they are also allowed to try to make progress on any other disease. Which do you expect will achieve the greatest success?
  This case is simple in the sense that one domain of inquiry is a strict superset of another, which makes the value of larger search space more clear, but the difference in sizes of search space will be there in cases that are less obvious as well.
  Our hypotheses suggest that the social scientists may be like the first researcher. They were pre-committed to smaller domain with less flexibility to go find promising veins of tractability in problem space. This, and variations on the general cardinality reasoning mentioned in footnote 7, are the core of our results.
  - Garrett Baker Mar 22, 2025, 5:56 AM
    6 points
    0
    Parent
    
    I understand the argument, I think I buy a limited version of it (and also want to acknowledge that it is very clever and I do like it), but I also don’t think this can explain the magnitude of the difference between the different fields. If we go back and ask “what was physics’ original goal?” we end up with “to explain how the heavens move, and the path that objects travel”, and this has basically been solved. Physicists didn’t substitute this for something easier. The next big problem was to explain heat & electricity, and that was solved. Then the internals of the atom, and the paradox of a fixed speed of light. And those were solved.
    
    I think maybe your argument holds for individual researchers. Individual education researchers are perhaps more constrained in what their colleagues will be interested in than individual physicists (though even that I’m somewhat doubtful of, maybe less doubtful on the scale of labs). But it seems to definitely break down when comparing the two fields against each other. Then, physics clearly has a very good track record of asking questions and then solving them extraordinarily well.