FWIW I think we’ve found one crucial angle on moral progress, but that this isn’t as surprising/coincidental as it may seem because there are several other angles on moral progress that are comparably important, including:
Political activism (e.g. free speech activism, various whistleblowers) that maintains societies in which moral progress can be made.
(The good parts of) neuroscience/psychology, which are making progress towards empirically-grounded theories of cognition, and thereby have (and will) teach us a lot about moral cognition.
Various approaches to introspection + emotional health (including buddhism, some therapy modalities, some psychiatry). These produce the internal clarity that is crucial for embodying + instantiating moral progress.
Some right-wing philosophers who I think are grappling with important aspects of moral progress that are too controversial for LessWrong (I don’t want to elaborate here because it’ll inevitably take over the thread, but am planning to write at more length about this soonish).
None of these seem as crucial as careful philosophical reasoning, because moral progress is currently not bottlenecked on any of them (except possibly the last item, which I do not know the contents of). To explain more, I think the strongest conclusion from careful philosophical reasoning so far is that we are still very far from knowing what normativity (decision theory and values, or more generally rationality and morality) consists of, and therefore the most important thing right now is to accumulate and preserve normative option value (the ability to eventually do the best thing with the most resources).
What is blocking this “interim morality” from being more broadly accepted? I don’t think it’s lack of either political activism (plenty of people in free societies also don’t care about preserving normative option value), neuroscience/psychology (how would it help at this point?), or introspection + emotional health (same question, how would it help?), but just that the vast majority of people do not care about trying to figure out normativity via careful philosophical reasoning, and instead are playing status games with other focal points.
<details>
<summary>Here’s a longer, more complete version of my argument, written by Gemini 2.5 Pro after some back and forth. Please feel free to read or ignore (if my own writing above seems clear enough).</summary>
Goal: The ultimate aim is moral progress, which requires understanding and implementing correct normativity (how to decide, what to value).
Primary Tool: The most fundamental tool we have for figuring out normativity at its roots is careful, skeptical philosophical reasoning. Empirical methods (like neuroscience) can inform this, but the core questions (what should be, what constitutes a good reason) are philosophical.
Current Philosophical State: The most robust conclusion from applying this tool carefully so far is that we are deeply uncertain about the content of correct normativity. We haven’t converged on a satisfactory theory of value or decision theory. Many plausible-seeming avenues have deep problems.
Rational Response to Uncertainty & Its Urgent Implication:
Principle: In the face of such profound, foundational uncertainty, the most rational interim strategy isn’t to commit prematurely to a specific, detailed value system (which is likely wrong), but to preserve and enhance optionality. This means acting in ways that maximize the chances that whatever the correct normative theory turns out to be, we (or our successors) will be in the best possible position (knowledge, resources, freedom of action) to understand and implement it. This is the “preserve normative option value” principle.
Urgent Application: Critically, the most significant threats to preserving this option value today are existential risks (e.g., from unaligned AI, pandemics, nuclear war) which could permanently foreclose any desirable future. Therefore, a major, urgent practical consequence of accepting the principle of normative option value is the prioritization of mitigating these existential risks.
The Current Bottleneck: Moral progress on the most critical front is primarily stalled because this philosophical conclusion (deep uncertainty) and its strategic implication (preserve option value)—especially its urgent consequence demanding the prioritization of x-risk mitigation—are not widely recognized, accepted, or acted upon with sufficient seriousness or resources.
Why Other Factors Aren’t the Primary Strategic Bottleneck Now:
Politics: Free societies exist where discussion could happen, yet this conclusion isn’t widely adopted within them. The bottleneck isn’t solely the lack of freedom, but the lack of focus on this specific line of reasoning and its implications.
Neuroscience/Psychology: While useful eventually, understanding the brain’s mechanisms doesn’t currently resolve the core philosophical uncertainty or directly compel the strategic focus on option value / x-risk. The relevant insight is primarily conceptual/philosophical at this stage.
Introspection/Emotional Health: While helpful, the lack of focus on option value / x-risk isn’t plausibly primarily caused by a global deficit in emotional health preventing people from grasping the concept. It’s more likely due to lack of engagement with the specific philosophical arguments, different priorities, and incentive structures.
Directness: Furthermore, addressing the conceptual bottleneck around option value and its link to x-risk seems like a more direct path to potentially shifting priorities towards mitigating the most pressing dangers quickly, compared to the slower, more systemic improvements involved in fixing politics, cognition, or widespread emotional health.
</details>
Edit: Hmm, <details> doesn’t seem to work in Markdown and I don’t know how else to write collapsible sections in Markdown, and I can’t copy/paste the AI content correctly in Docs mode. Guess I’ll leave it like this for now until the LW team fixes things.
In general I disagree pretty broadly with your view. Not quite sure how best to surface that disagreement but will give a quick shot:
I think it’s important to be capable of (at least) two types of reasoning:
Precise reasoning about desired outcomes and strategies to get there.
Broad reasoning about heuristics that seem robustly good.
We see this in the domain of morality, for example: utilitarianism is more like the former, deontology is more like the latter. High-level ideological goals tend to go pretty badly if people stop paying attention to robust deontological heuristics (like “don’t kill people”). As Eliezer has said somewhere, one of the key reasons to be deontological is that we’re running on corrupted hardware. But more generally, we’re running on logically uncertain hardware: we can’t model all the flow-through effects of our actions on other reasonably intelligent people (hell, we can’t even model all the flow-through effects of our actions on, say, animals—who can often “read” us in ways we’re not tracking). And so we often should be adopting robust-seeming heuristics even when we don’t know exactly why they work.
If you take your interim strategy seriously (but set aside x-risk) then I think you actually end up with something pretty similar to the main priorities of classic liberals: prevent global lock-in (by opposing expansionist powers like the Nazis), prevent domestic political lock-in (via upholding democracy), prevent ideological lock-in (via supporting free speech), give our descendants more optionality (via economic and technological growth). I don’t think this is a coincidence—it just often turns out that there are a bunch of heuristics that are really robustly good, and you can converge on them from many different directions.
This is part of why I’m less sold on “careful philosophical reasoning” as the key thing. Indeed, wanting to “commit prematurely to a specific, detailed value system” is historically very correlated with intellectualism (e.g. elites tend to be the rabid believers in communism, libertarianism, religion, etc—a lot of more “normal” people don’t take it that seriously even when they’re nominally on board). And so it’s very plausible that the thing we want is less philosophy, because (like, say, asteroid redirection technology) the risks outweigh the benefits.
Then we get to x-risk. That’s a domain where many broad heuristics break down (though still fewer than people think, as I’ll write about soon). And you might say: well, without careful philosophical reasoning, we wouldn’t have identified AI x-risk as a priority. Yes, but also: it’s very plausible to me that the net effect of LessWrong-inspired thinking on AI x-risk has been and continues to be negative. I describe some mechanisms halfway through this talk, but here are a couple that directly relate to the factors I mentioned in my last comment:
First, when people on LessWrong spread the word about AI risk, extreme psychological outliers like Sam Altman and Elon Musk then jump to do AI-related things in a way which often turns out to be destructive because of their trust issues and psychological neuroses.
Second, US governmental responses to AI risk are very much bottlenecked on being a functional government in general, which is bottlenecked by political advocacy (broadly construed) slash political power games.
Third, even within the AI safety community you have a bunch of people contributing to expectations of conflict with China (e.g. Leopold Aschenbrenner and Dan Hendrycks) and acceleration in general (e.g. by working on capabilities at Anthropic, or RSI evals) in a way which I hypothesize would be much better for the world if they had better introspection capabilities (I know this is a strong claim, I have an essay coming out on it soon).
And so even here it seems like a bunch of heuristics (such as “it’s better when people are mentally healthier” and “it’s better when politics is more functional”) actually were strong bottlenecks on the application of philosophical reasoning to do good. And I don’t think this is a coincidence.
tl;dr: careful philosophical reasoning is just one direction in which you can converge on a robustly good strategy for the future, and indeed is one of the more risky avenues by which to do so.
This is part of why I’m less sold on “careful philosophical reasoning” as the key thing. Indeed, wanting to “commit prematurely to a specific, detailed value system” is historically very correlated with intellectualism (e.g. elites tend to be the rabid believers in communism, libertarianism, religion, etc—a lot of more “normal” people don’t take it that seriously even when they’re nominally on board). And so it’s very plausible that the thing we want is less philosophy, because (like, say, asteroid redirection technology) the risks outweigh the benefits.
Here, you seem to conflate “careful philosophical reasoning” with intellectualism and philosophy in general. But in an earlier comment, I tried to draw a distinction between careful philosophical reasoning and the kind of hand-wavy thinking that has been called “philosophy” in most times and places. You didn’t respond to it in that thread… did you perhaps miss it?
More substantively, Eliezer talked about the Valley of Bad Rationality, and I think there’s probably something like that for philosophical thinking as well, which I admit definitely complicates the problem. I’m not going around and trying to push random people “into philosophy”, for example.
If you take your interim strategy seriously (but set aside x-risk) then I think you actually end up with something pretty similar to the main priorities of classic liberals: prevent global lock-in (by opposing expansionist powers like the Nazis), prevent domestic political lock-in (via upholding democracy), prevent ideological lock-in (via supporting free speech), give our descendants more optionality (via economic and technological growth). I don’t think this is a coincidence—it just often turns out that there are a bunch of heuristics that are really robustly good, and you can converge on them from many different directions.
Sure, there’s some overlap on things like free speech and preventing lock-in. But calling it convergence feels like a stretch. One of my top priorities is encouraging more people to base their moral evolution on careful philosophical reasoning instead of random status games. That’s pretty different from standard classical liberalism. Doesn’t this big difference suggest the other overlaps might just be coincidence? Have you explained your reasons anywhere for thinking it’s not a coincidence and that these heuristics are robust enough on their own, without grounding in some explicit principle like “normative option value” that could be used to flexibly adjust the heuristics according to the specific circumstances?
Yes, but also: it’s very plausible to me that the net effect of LessWrong-inspired thinking on AI x-risk has been and continues to be negative.
I think this is plausible too, but want to attribute it mostly to insufficiently careful thinking and playing other status games. I feel like with careful enough thinking and not being distracted/influenced by competing motivations, a lot of the negative effects could have been foreseen and prevented. For example, did you know that Eliezer/MIRI for years pursued a plan of racing to build the first AGI and making it aligned (Friendly), which I think inspired/contributed (via the founding of DeepMind) to the current crop of AI labs and their AI race, and that I had warned him at the time (in a LW post or comment) that the plan was very unlikely to succeed and would probably backfire this way?
Also, I would attribute Sam and Elon’s behavior not to mental health issues, but to (successfully) playing their own power/status game, with “not trusting Google / each other” just a cover for wanting to be the hero that saves the world, which in turn is just a cover for grabbing power and status. This seems perfectly reasonable and parsimonious from an evolutionary psychology perspective, and I don’t see why we need to hypothesize mental health issues to explain what they did.
EDIT: upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you’re talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference?
Original comment:
The term “careful thinking” here seems to be doing a lot of work, and I’m worried that there’s a kind of motte and bailey going on. In your earlier comment you describe it as “analytical philosophy, or more broadly careful/skeptical philosophy”. But I think we agree that most academic analytic philosophy is bad, and often worse than laypeople’s intuitive priors (in part due to strong selection effects on who enters the field—most philosophers of religion believe in god, most philosophers of aesthetics believe in the objectivity of aesthetics, etc).
So then we can fall back on LessWrong as an example of careful thinking. But as we discussed above, even the leading figure on LessWrong was insufficiently careful even about the main focus of his work for it to be robustly valuable.
So I basically get the sense that the role of careful thinking in your worldview is something like “the thing that I, Wei Dai, ascribe my success to”. And I do agree that you’ve been very successful in a bunch of intellectual endeavours. But I expect that your “secret sauce” is a confluence of a bunch of factors (including IQ, emotional temperament, background knowledge, etc) only one of which was “being in a community that prioritized careful thinking”. And then I also think you’re missing a bunch of other secret sauces that would make your impact on the world better (like more ability to export your ideas to other people).
In other words, the bailey seems to be “careful thinking is the thing we should prioritize in order to make the world better”, and the motte is “I, Wei Dai, seem to be doing something good, even if basically everyone else is falling into the valley of bad rationality”.
One reason I’m personally pushing back on this, btw, is that my own self-narrative for why I’m able to be intellectually productive in significant part relies on me being less intellectually careful than other people—so that I’m willing to throw out a bunch of ideas that are half-formed and non-rigorous, iterate, and eventually get to the better ones. Similarly, a lot of the value that the wider blogosphere has created comes from people being less careful than existing academic norms (including Eliezer and Scott Alexander, whose best works are often quite polemic).
In short: I totally think we want more people coming up with good ideas, and that this is a big bottleneck. But there are many different directions in which we should tug people in order to make them more intellectually productive. Many academics should be less careful. Many people on LessWrong should be more careful. Some scientists should be less empirical, others should be more empirical; some less mathematically rigorous, others more mathematically rigorous. Others should try to live in countries that are less repressive of new potentially-crazy ideas (hence politics being important). And then, of course, others should be figuring out how to actually get good ideas implemented.
Meanwhile, Eliezer and Sam and Elon should have had less of a burning desire to found an AGI lab. I agree that this can be described by “wanting to be the hero who saves the world”, but this seems to function as a curiosity stopper for you. When I talk about emotional health a lot of what I mean is finding ways to become less status-oriented (or, in your own words, “not being distracted/influenced by competing motivations”). I think of extremely strong motivations to change the world (as these outlier figures have) as typically driven by some kind of core emotional dysregulation. And specifically I think of fear-based motivation as the underlying phenomenon which implements status-seeking and many other behaviors which are harmful when taken too far. (This is not an attempt to replace evo-psych, btw—it’s an account of the implementation mechanisms that evolution used to get us to do the things it wanted, which now are sometimes maladapted to our current environment.) I write about a bunch of these models in my Replacing Fear sequence.
When I talk about emotional health a lot of what I mean is finding ways to become less status-oriented (or, in your own words, “not being distracted/influenced by competing motivations”).
To clarify this as well, when I said (or implied) that Eliezer was “distracted/influenced by competing motivations” I didn’t mean that he was too status-oriented (I think I’m probably just as status-oriented as him), but rather that he wasn’t just playing the status game which rewards careful philosophical reasoning, but also a game that rewards being heroic and saving (or appearing/attempting to save) the world.
I’ve now read/skimmed your Replacing Fear sequence, but I’m pretty skeptical that becoming less status-oriented is both possible and a good idea. It seems like the only example you gave in the sequence is yourself, and you didn’t really talk about whether/how you became less status-oriented? (E.g., can this be observed externally?) And making a lot of people care less about status could have negative unintentional consequences, as people being concerned about status seems to be a major pillar of how human morality currently works and how our society is held together.
upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you’re talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference?
World worse than it could be:
social darwinism
various revolutions driven by flawed ideologies, e.g., Sun Yat-sen’s attempt to switch China from a monarchy to a democratic republic overnight with virtually no cultural/educational foundation or preparation, leading to governance failures and later communist takeover (see below for a more detailed explanation of this)
AI labs trying to save the world by racing with each other
World better than it could be:
invention/propagation of the concept of naturalistic fallacy, tempering a lot of bad moral philosophies
moral/normative uncertainty and complexity of value being fairly well known, including among AI researchers, such that we rarely see proposals to imbue AI with the one true morality nowadays
<details>
The Enlightenment’s Flawed Reasoning and its Negative Consequences (written by Gemini 2.5 Pro under my direction)
While often lauded, the Enlightenment shouldn’t automatically be classified as a triumph of “careful philosophical reasoning,” particularly concerning its foundational concept of “natural rights.” The core argument against its “carefulness” rests on several points:
Philosophically “Hand-Wavy” Concept of Natural Rights: The idea that rights are “natural,” “self-evident,” or inherent in a “state of nature” lacks rigorous philosophical grounding. Attempts to justify them relied on vague appeals to God, an ill-defined “Nature,” or intuition, rather than robust, universally compelling reasoning. It avoids the hard work of justifying why certain entitlements should exist and be protected, famously leading critics like Bentham to dismiss them as “nonsense upon stilts.”
Superficial Understanding Leading to Flawed Implementation: This lack of careful philosophical grounding wasn’t just an academic issue. It fostered a potentially superficial understanding of what rights are and what is required to make them real. Instead of seeing rights as complex, practical social and political achievements that require deep institutional infrastructure (rule of law, independent courts, enforcement mechanisms) and specific cultural norms (tolerance, civic virtue, respect for process), the “natural rights” framing could suggest they merely need to be declared or recognized to exist.
Case Study: China’s Premature Turn to Democracy: The negative consequences of this superficial understanding can be illustrated by the attempt to rapidly transition China from monarchy to a democratic republic in the early 20th century.
Influenced by Enlightenment ideals, reformers and revolutionaries like Sun Yat-sen adopted the forms of Western republicanism and rights-based governance.
However, the prevailing ideology, arguably built on this less-than-careful philosophy, underestimated the immense practical difficulty and the necessary prerequisites for such a system to function, especially in China’s context.
If Chinese intellectuals and leaders had instead operated from a more careful, practical philosophical understanding – viewing rights not as “natural” but as outcomes needing to be carefully constructed and secured through institutions and cultural development – they might have pursued different strategies.
Specifically, they might have favored gradualism, supporting constitutional reforms under the weakening Qing dynasty or working with reform-minded officials and strongmen like Yuan Shikai to build the necessary political and cultural infrastructure over time. This could have involved strengthening proto-parliamentary bodies, legal systems, and civic education incrementally.
Instead, the revolutionary fervor, fueled in part by the appealing but ultimately less “careful” ideology of inherent rights and immediate republicanism, pushed for a radical break. This premature adoption of democratic forms without the functional substance contributed significantly to the collapse of central authority, the chaos of the Warlord Era, and ultimately created conditions ripe for the rise of the Communist Party, leading the country down a very different and tragic path.
In Conclusion: This perspective argues that the Enlightenment, despite its positive contributions, contained significant philosophical weaknesses, particularly in its conception of rights. This lack of “carefulness” wasn’t benign; it fostered an incomplete understanding that, when adopted by influential actors facing complex political realities like those in early 20th-century China, contributed to disastrous strategic choices and ultimately made the world worse than it might have been had a more pragmatically grounded philosophy prevailed. It underscores how the quality and depth of philosophical reasoning can have profound real-world consequences.
</details>
So I basically get the sense that the role of careful thinking in your worldview is something like “the thing that I, Wei Dai, ascribe my success to”. And I do agree that you’ve been very successful in a bunch of intellectual endeavours. But I expect that your “secret sauce” is a confluence of a bunch of factors (including IQ, emotional temperament, background knowledge, etc) only one of which was “being in a community that prioritized careful thinking”.
This seems fair, and I guess from this perspective my response is that I’m not sure how to intervene on the other factors (aside from enhancing human IQ, which I do support). It seems like your view is that emotional temperament is also a good place to intervene? If so, perhaps I should read your posts with this in mind. (I previously didn’t see how the Replacing Fear sequence was relevant to my concerns, and mostly skipped it.)
And then I also think you’re missing a bunch of other secret sauces that would make your impact on the world better (like more ability to export your ideas to other people).
I’m actually reluctant to export my ideas to more people, especially those who don’t care as much about careful reasoning (which unfortunately is almost everyone), as I don’t want to be responsible for people misusing my ideas, e.g., overconfidently putting them into practice or extending them in wrong directions.
However I’m trying to practice some skills related to exporting ideas (such as talking to people in real time and participating on X) in case it does seem to be a good idea one day. Would be interested to hear more about what other secret sauces related to this I might be missing. (I guess public speaking is another one, but the cost of practicing that one is too high for me.)
One reason I’m personally pushing back on this, btw, is that my own self-narrative for why I’m able to be intellectually productive in significant part relies on me being less intellectually careful than other people—so that I’m willing to throw out a bunch of ideas that are half-formed and non-rigorous, iterate, and eventually get to the better ones.
To be clear, I think this is totally fine, as long as you take care to not be or appear too confident about these half-formed ideas, and take precautions against other people taking your ideas more seriously than they should (such as by monitoring subsequent discussions and weighing in against other people’s over-enthusiasm). I think “careful thinking” can and should be a social activity, which would necessitate communicating half-formed ideas during the collaborative process. I’ve done this myself plenty of times, such as in my initial UDT post, which was very informal and failed to anticipate many subsequently discovered problems, so I’m rather surprised that you think I would be against this.
FWIW I think we’ve found one crucial angle on moral progress, but that this isn’t as surprising/coincidental as it may seem because there are several other angles on moral progress that are comparably important, including:
Political activism (e.g. free speech activism, various whistleblowers) that maintains societies in which moral progress can be made.
(The good parts of) neuroscience/psychology, which are making progress towards empirically-grounded theories of cognition, and thereby have (and will) teach us a lot about moral cognition.
Various approaches to introspection + emotional health (including buddhism, some therapy modalities, some psychiatry). These produce the internal clarity that is crucial for embodying + instantiating moral progress.
Some right-wing philosophers who I think are grappling with important aspects of moral progress that are too controversial for LessWrong (I don’t want to elaborate here because it’ll inevitably take over the thread, but am planning to write at more length about this soonish).
None of these seem as crucial as careful philosophical reasoning, because moral progress is currently not bottlenecked on any of them (except possibly the last item, which I do not know the contents of). To explain more, I think the strongest conclusion from careful philosophical reasoning so far is that we are still very far from knowing what normativity (decision theory and values, or more generally rationality and morality) consists of, and therefore the most important thing right now is to accumulate and preserve normative option value (the ability to eventually do the best thing with the most resources).
What is blocking this “interim morality” from being more broadly accepted? I don’t think it’s lack of either political activism (plenty of people in free societies also don’t care about preserving normative option value), neuroscience/psychology (how would it help at this point?), or introspection + emotional health (same question, how would it help?), but just that the vast majority of people do not care about trying to figure out normativity via careful philosophical reasoning, and instead are playing status games with other focal points.
<details>
<summary>Here’s a longer, more complete version of my argument, written by Gemini 2.5 Pro after some back and forth. Please feel free to read or ignore (if my own writing above seems clear enough).</summary>
Goal: The ultimate aim is moral progress, which requires understanding and implementing correct normativity (how to decide, what to value).
Primary Tool: The most fundamental tool we have for figuring out normativity at its roots is careful, skeptical philosophical reasoning. Empirical methods (like neuroscience) can inform this, but the core questions (what should be, what constitutes a good reason) are philosophical.
Current Philosophical State: The most robust conclusion from applying this tool carefully so far is that we are deeply uncertain about the content of correct normativity. We haven’t converged on a satisfactory theory of value or decision theory. Many plausible-seeming avenues have deep problems.
Rational Response to Uncertainty & Its Urgent Implication:
Principle: In the face of such profound, foundational uncertainty, the most rational interim strategy isn’t to commit prematurely to a specific, detailed value system (which is likely wrong), but to preserve and enhance optionality. This means acting in ways that maximize the chances that whatever the correct normative theory turns out to be, we (or our successors) will be in the best possible position (knowledge, resources, freedom of action) to understand and implement it. This is the “preserve normative option value” principle.
Urgent Application: Critically, the most significant threats to preserving this option value today are existential risks (e.g., from unaligned AI, pandemics, nuclear war) which could permanently foreclose any desirable future. Therefore, a major, urgent practical consequence of accepting the principle of normative option value is the prioritization of mitigating these existential risks.
The Current Bottleneck: Moral progress on the most critical front is primarily stalled because this philosophical conclusion (deep uncertainty) and its strategic implication (preserve option value)—especially its urgent consequence demanding the prioritization of x-risk mitigation—are not widely recognized, accepted, or acted upon with sufficient seriousness or resources.
Why Other Factors Aren’t the Primary Strategic Bottleneck Now:
Politics: Free societies exist where discussion could happen, yet this conclusion isn’t widely adopted within them. The bottleneck isn’t solely the lack of freedom, but the lack of focus on this specific line of reasoning and its implications.
Neuroscience/Psychology: While useful eventually, understanding the brain’s mechanisms doesn’t currently resolve the core philosophical uncertainty or directly compel the strategic focus on option value / x-risk. The relevant insight is primarily conceptual/philosophical at this stage.
Introspection/Emotional Health: While helpful, the lack of focus on option value / x-risk isn’t plausibly primarily caused by a global deficit in emotional health preventing people from grasping the concept. It’s more likely due to lack of engagement with the specific philosophical arguments, different priorities, and incentive structures.
Directness: Furthermore, addressing the conceptual bottleneck around option value and its link to x-risk seems like a more direct path to potentially shifting priorities towards mitigating the most pressing dangers quickly, compared to the slower, more systemic improvements involved in fixing politics, cognition, or widespread emotional health.
</details>
Edit: Hmm, <details> doesn’t seem to work in Markdown and I don’t know how else to write collapsible sections in Markdown, and I can’t copy/paste the AI content correctly in Docs mode. Guess I’ll leave it like this for now until the LW team fixes things.
In general I disagree pretty broadly with your view. Not quite sure how best to surface that disagreement but will give a quick shot:
I think it’s important to be capable of (at least) two types of reasoning:
Precise reasoning about desired outcomes and strategies to get there.
Broad reasoning about heuristics that seem robustly good.
We see this in the domain of morality, for example: utilitarianism is more like the former, deontology is more like the latter. High-level ideological goals tend to go pretty badly if people stop paying attention to robust deontological heuristics (like “don’t kill people”). As Eliezer has said somewhere, one of the key reasons to be deontological is that we’re running on corrupted hardware. But more generally, we’re running on logically uncertain hardware: we can’t model all the flow-through effects of our actions on other reasonably intelligent people (hell, we can’t even model all the flow-through effects of our actions on, say, animals—who can often “read” us in ways we’re not tracking). And so we often should be adopting robust-seeming heuristics even when we don’t know exactly why they work.
If you take your interim strategy seriously (but set aside x-risk) then I think you actually end up with something pretty similar to the main priorities of classic liberals: prevent global lock-in (by opposing expansionist powers like the Nazis), prevent domestic political lock-in (via upholding democracy), prevent ideological lock-in (via supporting free speech), give our descendants more optionality (via economic and technological growth). I don’t think this is a coincidence—it just often turns out that there are a bunch of heuristics that are really robustly good, and you can converge on them from many different directions.
This is part of why I’m less sold on “careful philosophical reasoning” as the key thing. Indeed, wanting to “commit prematurely to a specific, detailed value system” is historically very correlated with intellectualism (e.g. elites tend to be the rabid believers in communism, libertarianism, religion, etc—a lot of more “normal” people don’t take it that seriously even when they’re nominally on board). And so it’s very plausible that the thing we want is less philosophy, because (like, say, asteroid redirection technology) the risks outweigh the benefits.
Then we get to x-risk. That’s a domain where many broad heuristics break down (though still fewer than people think, as I’ll write about soon). And you might say: well, without careful philosophical reasoning, we wouldn’t have identified AI x-risk as a priority. Yes, but also: it’s very plausible to me that the net effect of LessWrong-inspired thinking on AI x-risk has been and continues to be negative. I describe some mechanisms halfway through this talk, but here are a couple that directly relate to the factors I mentioned in my last comment:
First, when people on LessWrong spread the word about AI risk, extreme psychological outliers like Sam Altman and Elon Musk then jump to do AI-related things in a way which often turns out to be destructive because of their trust issues and psychological neuroses.
Second, US governmental responses to AI risk are very much bottlenecked on being a functional government in general, which is bottlenecked by political advocacy (broadly construed) slash political power games.
Third, even within the AI safety community you have a bunch of people contributing to expectations of conflict with China (e.g. Leopold Aschenbrenner and Dan Hendrycks) and acceleration in general (e.g. by working on capabilities at Anthropic, or RSI evals) in a way which I hypothesize would be much better for the world if they had better introspection capabilities (I know this is a strong claim, I have an essay coming out on it soon).
And so even here it seems like a bunch of heuristics (such as “it’s better when people are mentally healthier” and “it’s better when politics is more functional”) actually were strong bottlenecks on the application of philosophical reasoning to do good. And I don’t think this is a coincidence.
tl;dr: careful philosophical reasoning is just one direction in which you can converge on a robustly good strategy for the future, and indeed is one of the more risky avenues by which to do so.
Here, you seem to conflate “careful philosophical reasoning” with intellectualism and philosophy in general. But in an earlier comment, I tried to draw a distinction between careful philosophical reasoning and the kind of hand-wavy thinking that has been called “philosophy” in most times and places. You didn’t respond to it in that thread… did you perhaps miss it?
More substantively, Eliezer talked about the Valley of Bad Rationality, and I think there’s probably something like that for philosophical thinking as well, which I admit definitely complicates the problem. I’m not going around and trying to push random people “into philosophy”, for example.
Sure, there’s some overlap on things like free speech and preventing lock-in. But calling it convergence feels like a stretch. One of my top priorities is encouraging more people to base their moral evolution on careful philosophical reasoning instead of random status games. That’s pretty different from standard classical liberalism. Doesn’t this big difference suggest the other overlaps might just be coincidence? Have you explained your reasons anywhere for thinking it’s not a coincidence and that these heuristics are robust enough on their own, without grounding in some explicit principle like “normative option value” that could be used to flexibly adjust the heuristics according to the specific circumstances?
I think this is plausible too, but want to attribute it mostly to insufficiently careful thinking and playing other status games. I feel like with careful enough thinking and not being distracted/influenced by competing motivations, a lot of the negative effects could have been foreseen and prevented. For example, did you know that Eliezer/MIRI for years pursued a plan of racing to build the first AGI and making it aligned (Friendly), which I think inspired/contributed (via the founding of DeepMind) to the current crop of AI labs and their AI race, and that I had warned him at the time (in a LW post or comment) that the plan was very unlikely to succeed and would probably backfire this way?
Also, I would attribute Sam and Elon’s behavior not to mental health issues, but to (successfully) playing their own power/status game, with “not trusting Google / each other” just a cover for wanting to be the hero that saves the world, which in turn is just a cover for grabbing power and status. This seems perfectly reasonable and parsimonious from an evolutionary psychology perspective, and I don’t see why we need to hypothesize mental health issues to explain what they did.
EDIT: upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you’re talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference?
Original comment:
The term “careful thinking” here seems to be doing a lot of work, and I’m worried that there’s a kind of motte and bailey going on. In your earlier comment you describe it as “analytical philosophy, or more broadly careful/skeptical philosophy”. But I think we agree that most academic analytic philosophy is bad, and often worse than laypeople’s intuitive priors (in part due to strong selection effects on who enters the field—most philosophers of religion believe in god, most philosophers of aesthetics believe in the objectivity of aesthetics, etc).
So then we can fall back on LessWrong as an example of careful thinking. But as we discussed above, even the leading figure on LessWrong was insufficiently careful even about the main focus of his work for it to be robustly valuable.
So I basically get the sense that the role of careful thinking in your worldview is something like “the thing that I, Wei Dai, ascribe my success to”. And I do agree that you’ve been very successful in a bunch of intellectual endeavours. But I expect that your “secret sauce” is a confluence of a bunch of factors (including IQ, emotional temperament, background knowledge, etc) only one of which was “being in a community that prioritized careful thinking”. And then I also think you’re missing a bunch of other secret sauces that would make your impact on the world better (like more ability to export your ideas to other people).
In other words, the bailey seems to be “careful thinking is the thing we should prioritize in order to make the world better”, and the motte is “I, Wei Dai, seem to be doing something good, even if basically everyone else is falling into the valley of bad rationality”.
One reason I’m personally pushing back on this, btw, is that my own self-narrative for why I’m able to be intellectually productive in significant part relies on me being less intellectually careful than other people—so that I’m willing to throw out a bunch of ideas that are half-formed and non-rigorous, iterate, and eventually get to the better ones. Similarly, a lot of the value that the wider blogosphere has created comes from people being less careful than existing academic norms (including Eliezer and Scott Alexander, whose best works are often quite polemic).
In short: I totally think we want more people coming up with good ideas, and that this is a big bottleneck. But there are many different directions in which we should tug people in order to make them more intellectually productive. Many academics should be less careful. Many people on LessWrong should be more careful. Some scientists should be less empirical, others should be more empirical; some less mathematically rigorous, others more mathematically rigorous. Others should try to live in countries that are less repressive of new potentially-crazy ideas (hence politics being important). And then, of course, others should be figuring out how to actually get good ideas implemented.
Meanwhile, Eliezer and Sam and Elon should have had less of a burning desire to found an AGI lab. I agree that this can be described by “wanting to be the hero who saves the world”, but this seems to function as a curiosity stopper for you. When I talk about emotional health a lot of what I mean is finding ways to become less status-oriented (or, in your own words, “not being distracted/influenced by competing motivations”). I think of extremely strong motivations to change the world (as these outlier figures have) as typically driven by some kind of core emotional dysregulation. And specifically I think of fear-based motivation as the underlying phenomenon which implements status-seeking and many other behaviors which are harmful when taken too far. (This is not an attempt to replace evo-psych, btw—it’s an account of the implementation mechanisms that evolution used to get us to do the things it wanted, which now are sometimes maladapted to our current environment.) I write about a bunch of these models in my Replacing Fear sequence.
To clarify this as well, when I said (or implied) that Eliezer was “distracted/influenced by competing motivations” I didn’t mean that he was too status-oriented (I think I’m probably just as status-oriented as him), but rather that he wasn’t just playing the status game which rewards careful philosophical reasoning, but also a game that rewards being heroic and saving (or appearing/attempting to save) the world.
I’ve now read/skimmed your Replacing Fear sequence, but I’m pretty skeptical that becoming less status-oriented is both possible and a good idea. It seems like the only example you gave in the sequence is yourself, and you didn’t really talk about whether/how you became less status-oriented? (E.g., can this be observed externally?) And making a lot of people care less about status could have negative unintentional consequences, as people being concerned about status seems to be a major pillar of how human morality currently works and how our society is held together.
World worse than it could be:
social darwinism
various revolutions driven by flawed ideologies, e.g., Sun Yat-sen’s attempt to switch China from a monarchy to a democratic republic overnight with virtually no cultural/educational foundation or preparation, leading to governance failures and later communist takeover (see below for a more detailed explanation of this)
AI labs trying to save the world by racing with each other
World better than it could be:
invention/propagation of the concept of naturalistic fallacy, tempering a lot of bad moral philosophies
moral/normative uncertainty and complexity of value being fairly well known, including among AI researchers, such that we rarely see proposals to imbue AI with the one true morality nowadays
<details> The Enlightenment’s Flawed Reasoning and its Negative Consequences (written by Gemini 2.5 Pro under my direction)
While often lauded, the Enlightenment shouldn’t automatically be classified as a triumph of “careful philosophical reasoning,” particularly concerning its foundational concept of “natural rights.” The core argument against its “carefulness” rests on several points:
Philosophically “Hand-Wavy” Concept of Natural Rights: The idea that rights are “natural,” “self-evident,” or inherent in a “state of nature” lacks rigorous philosophical grounding. Attempts to justify them relied on vague appeals to God, an ill-defined “Nature,” or intuition, rather than robust, universally compelling reasoning. It avoids the hard work of justifying why certain entitlements should exist and be protected, famously leading critics like Bentham to dismiss them as “nonsense upon stilts.”
Superficial Understanding Leading to Flawed Implementation: This lack of careful philosophical grounding wasn’t just an academic issue. It fostered a potentially superficial understanding of what rights are and what is required to make them real. Instead of seeing rights as complex, practical social and political achievements that require deep institutional infrastructure (rule of law, independent courts, enforcement mechanisms) and specific cultural norms (tolerance, civic virtue, respect for process), the “natural rights” framing could suggest they merely need to be declared or recognized to exist.
Case Study: China’s Premature Turn to Democracy: The negative consequences of this superficial understanding can be illustrated by the attempt to rapidly transition China from monarchy to a democratic republic in the early 20th century.
Influenced by Enlightenment ideals, reformers and revolutionaries like Sun Yat-sen adopted the forms of Western republicanism and rights-based governance.
However, the prevailing ideology, arguably built on this less-than-careful philosophy, underestimated the immense practical difficulty and the necessary prerequisites for such a system to function, especially in China’s context.
If Chinese intellectuals and leaders had instead operated from a more careful, practical philosophical understanding – viewing rights not as “natural” but as outcomes needing to be carefully constructed and secured through institutions and cultural development – they might have pursued different strategies.
Specifically, they might have favored gradualism, supporting constitutional reforms under the weakening Qing dynasty or working with reform-minded officials and strongmen like Yuan Shikai to build the necessary political and cultural infrastructure over time. This could have involved strengthening proto-parliamentary bodies, legal systems, and civic education incrementally.
Instead, the revolutionary fervor, fueled in part by the appealing but ultimately less “careful” ideology of inherent rights and immediate republicanism, pushed for a radical break. This premature adoption of democratic forms without the functional substance contributed significantly to the collapse of central authority, the chaos of the Warlord Era, and ultimately created conditions ripe for the rise of the Communist Party, leading the country down a very different and tragic path.
In Conclusion: This perspective argues that the Enlightenment, despite its positive contributions, contained significant philosophical weaknesses, particularly in its conception of rights. This lack of “carefulness” wasn’t benign; it fostered an incomplete understanding that, when adopted by influential actors facing complex political realities like those in early 20th-century China, contributed to disastrous strategic choices and ultimately made the world worse than it might have been had a more pragmatically grounded philosophy prevailed. It underscores how the quality and depth of philosophical reasoning can have profound real-world consequences. </details>
This seems fair, and I guess from this perspective my response is that I’m not sure how to intervene on the other factors (aside from enhancing human IQ, which I do support). It seems like your view is that emotional temperament is also a good place to intervene? If so, perhaps I should read your posts with this in mind. (I previously didn’t see how the Replacing Fear sequence was relevant to my concerns, and mostly skipped it.)
I’m actually reluctant to export my ideas to more people, especially those who don’t care as much about careful reasoning (which unfortunately is almost everyone), as I don’t want to be responsible for people misusing my ideas, e.g., overconfidently putting them into practice or extending them in wrong directions.
However I’m trying to practice some skills related to exporting ideas (such as talking to people in real time and participating on X) in case it does seem to be a good idea one day. Would be interested to hear more about what other secret sauces related to this I might be missing. (I guess public speaking is another one, but the cost of practicing that one is too high for me.)
To be clear, I think this is totally fine, as long as you take care to not be or appear too confident about these half-formed ideas, and take precautions against other people taking your ideas more seriously than they should (such as by monitoring subsequent discussions and weighing in against other people’s over-enthusiasm). I think “careful thinking” can and should be a social activity, which would necessitate communicating half-formed ideas during the collaborative process. I’ve done this myself plenty of times, such as in my initial UDT post, which was very informal and failed to anticipate many subsequently discovered problems, so I’m rather surprised that you think I would be against this.
We don’t yet have collapsible sections in Markdown, but will have them in the next deploy. The syntax will be: