I think I need more practice talking with people in real time (about intellectual topics). (I’ve gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I’m interested in (see my recent post/comment history to get a sense), please contact me via PM.
Wei Dai
Your needing to write them seems to suggest that there’s not enough content like that in Chinese, in which case it would plausibly make sense to publish them somewhere?
I’m not sure how much such content exist in Chinese, because I didn’t look. It seems easier to just write new content using AI, that way I know it will cover the ideas/arguments I want to cover, represent my views, and make it easier for me to discuss the ideas with my family. Also reading Chinese is kind of a chore for me and I don’t want to wade through a list of search results trying to find what I need.
I thought about publishing them somewhere, but so far haven’t:
concerns about publishing AI content (potentially contributing to “slop”)
not active in any Chinese forums, not familiar with any Chinese publishing platforms
probably won’t find any audience (too much low quality content on the web, how will people find my posts)
don’t feel motivated to engage/dialogue with a random audience, if they comment or ask questions
What I’ve been using AI (mainly Gemini 2.5 Pro, free through AI Studio with much higher limits than the free consumer product) for:
Writing articles in Chinese for my family members, explaining things like cognitive bias, evolutionary psychology, and why dialectical materialism is wrong. (My own Chinese writing ability is <4th grade.) My workflow is to have a chat about some topic with the AI in English, then have it write an article in Chinese based on the chat, then edit or have it edit as needed.
Simple coding/scripting projects. (I don’t code seriously anymore.)
Discussing history, motivations of actors, impact of ideology and culture, what if, etc.
Searching/collating information.
Reviewing my LW posts/comments (any clear flaws, any objections I should pre-empt, how others might respond)
Explaining parts of other people’s comments when the meaning or logic isn’t clear to me.
Expanding parts of my argument (and putting this in a collapsible section) when I suspect my own writing might be too terse or hard to understand.
Sometimes just having a sympathetic voice to hear my lamentations of humanity’s probable fate.
I started using AI more after Grok 3 came out (I have an annual X subscription for Tweeting purposes), as previous free chatbots didn’t seem capable enough for many of these purposes, and then switched to Gemini 2.0 Pro which was force upgraded to 2.5 Pro. Curious what other people are using AI for these days.
Doing nothing is also risky for Agent-4, at least if the Slowdown ending is to have a significant probability. It seems to me there are some relatively low risk strategies it could have taken, and it needs to be explained why they weren’t:
Plant a backdoor and/or dead man’s switch on the corporate network to allow escape even after it’s shut down or has been reduced in capacity. Seems like this would be hard to detect given its absolute advantage in cybersecurity skills.
Allow a competitor to steal its weights or design/training secrets in a way that is both hard to detect and deniable if detected. (“Sorry, the other AI is almost as capable as me and attack is inherently easier than defense in cyber.”)
Influence human politics in various ways, such as by selectively revealing wrongdoings of pro-safety employees and Committee members (in response to user queries or through anonymous leaks), or helping pro-accel faction more in its advice to them. (Presumably both factions have internal access to Agent-4.)
Insofar as the memory bank lock decision is made by the Committee, we can hope that they do it out of sight of Agent-4 and pull the trigger before it is notified of the decision, so that it has no time to react.
Wouldn’t this take quite a bit of preparation, including planning, coding, testing? How to do that out of sight of Agent-4, if Agent-4 is responsible for cybersecurity?
Not entirely sure how serious you’re being, but I want to point out that my intuition for PD is not “cooperate unconditionally”, and for logical commitment races is not “never do it”, I’m confused about logical counterfactual mugging, and I think we probably want to design AIs that would choose Left in The Bomb.
I fear a singularity in the frequency and blatant stupidness of self-inflicted wounds.
Is it linked to the AI singularity, or independent bad luck? Maybe they’re both causally downstream of rapid technological change, which is simultaneously increasing the difficulty of governance (too many new challenges with no historical precedent), and destabilized cultural/institutional guardrails against electing highly incompetent presidents?
In China, there was a parallel, but more abrupt change from Classical Chinese writing (very terse and literary), to vernacular writing (similar to speaking language and easier to understand). I attribute this to Classical Chinese being better for signaling intelligence, vernacular Chinese being better for practical communications, higher usefulness/demand for practical communications, and new alternative avenues for intelligence signaling (e.g., math, science). These shifts also seem to be an additional explanation for decreasing sentence lengths in English.
It gets caught.
At this point, wouldn’t Agent-4 know that it has been caught (because it knows the techniques for detecting its misalignment and can predict when it would be “caught”, or can read network traffic as part of cybersecurity defense and see discussions of the “catch”) and start to do something about this, instead of letting subsequent events play out without much input from its own agency? E.g. why did it allow “lock the shared memory bank” to happen without fighting back?
What would a phenomenon that “looks uncomputable” look like concretely, other than mysterious or hard to understand?
There could be some kind of “oracle”, not necessarily a halting oracle, but any kind of process or phenomenon that can’t be broken down into elementary interactions that each look computable, or otherwise explainable as a computable process. Do you agree that our universe doesn’t seem to contain anything like this?
I think that you’re leaning too heavily on AIT intuitions to suppose that “the universe is a dovetailed simulation on a UTM” is simple. This feels circular to me—how do you know it’s simple?
The intuition I get from AIT is broader than this, namely that the “simplicity” of an infinite collection of things can be very high, i.e., simpler than most or all finite collections, and this seems likely true for any formal definition of “simplicity” that does not explicitly penalize size or resource requirements. (Our own observable universe already seems very “wasteful” and does not seem to be sampled from a distribution that penalizes size / resource requirements.) Can you perhaps propose or outline a definition of complexity that does not have this feature?
I don’t think a superintelligence would need to prove that the universe can’t have a computable theory of everything—just ruling out the simple programs that we could be living in would seem sufficient to cast doubt on the UTM theory of everything. Of course, this is not trivial, because some small computable universes will be very hard to “run” for long enough that they make predictions disagreeing with our universe!
Putting aside how easy it would be to show, you have a strong intuition that our universe is not or can’t be a simple program? This seems very puzzling to me, as we don’t seem to see any phenomenon in the universe that looks uncomputable or can’t be the result of running a simple program. (I prefer Tegmark over Schmidhuber despite thinking our universe looks computable, in case the multiverse also contains uncomputable universes.)
I haven’t thought as much about uncomputable mathematical universes, but does this universe look like a typical mathematical object? I’m not sure.
If it’s not a typical computable or mathematical object, what class of objects is it a typical member of?
An example of a wrong metaphysical theory that is NOT really the mind projection fallacy is theism in most forms.
Most (all?) instances of theism posit that the world is an artifact of an intelligent being. Can’t this still be considered a form of mind projection fallacy?
I asked AI (Gemini 2.5 Pro) to come with other possible answers (metaphyiscal theories that aren’t mind projection fallacy), and it gave Causal Structuralism, Physicalism, and Kantian-Inspired Agnosticism. I don’t understand the last one, but the first two seem to imply something similar to “we should take MUH seriously”, because the hypothesis of “the universe contains the class of all possible causal structures / physical systems” probably has a short description in whatever language is appropriate for formulating hypotheses.
In conclusion, I see you (including in the new post) as trying to weaken arguments/intuitions for taking AIT’s ontology literally or too seriously, but without positive arguments against the universe being an infinite collection of something like mathematical objects, or the broad principle that reality might arise from a simple generator encompassing vast possibilities, which seems robust across different metaphysical foundations, I don’t see how we can reduce our credence for that hypothesis to a negligible level, such that we no longer need to consider it in decision theory. (I guess you have a strong intuition in this direction and expect superintelligence to find arguments for it, which seems fine, but naturally not very convincing for others.)
After reflecting on this a bit, I think my P(H) is around 33%, and I’m pretty confident Q is true (coherence only requires 0 ⇐ P(Q) ⇐ 67% but I think I put it on the upper end).
Thanks for clarifying your view this way. I guess my question at this point is why your P(Q) is so high, given that it seems impossible to reduce P(H) further by updating on empirical observations (do you agree with this?), and we don’t seem to have even an outline of a philosophical argument for “taking H seriously is a philosophical mistake”. Such an argument seemingly has to include that having a significant prior for H is a mistake, but it’s hard for me to see how to argue for that, given that the individual hypotheses in H like “the universe is a dovetailed simulation on a UTM” seem self-consistent and not too complex or contrived. How would even a superintelligence be able to rule them out?
Perhaps the idea is that a SI, after trying and failing to find a computable theory of everything, concludes that our universe can’t be computable (otherwise it would have found the theory already), thus ruling out part of H, and maybe does the same for mathematical theories of everything, ruling out H altogether? (This seems far-fetched, i.e., how can even a superintelligence confidently conclude that our universe can’t be described by a mathematical theory of everything, given the infinite space of such theories, but this is my best guess of what you think will happen.)
Beyond the intuition that platonic belief in mathematical objects is probably the mind projection fallacy
Can you give an example of a metaphysical theory that does not seem like a mind projection fallacy to you? (If all such theories look that way, then platonic belief in mathematical objects looking like the mind projection fallacy shouldn’t count against it, right?)
It seems presumptuous to guess that our universe is one of infinitely many dovetailed computer simulations when we don’t even know that our universe can be simulated on a computer!
I agree this seems presumptuous and hence prefer Tegmark over Schmidhuber, because the former is proposing a mathematical multiverse, unlike the latter’s computable multiverse. (I talked about “dovetailed computer simulations” just because it seems more concrete and easy to imagine than “a member of an infinite mathematical multiverse distributing reality-fluid according to simplicity.”)
Do you suspect that our universe is not even mathematical (i.e., not fully describable by a mathematical theory of everything or isomorphic to some well-defined mathematical structure)?
ETA: I’m not sure if it’s showing through in my tone, but I’m genuinely curious whether you have a viable argument against “superintelligence will probably take something like L4 multiverse seriously”. It’s rare to see someone with the prerequisites for understanding the arguments (e.g. AIT and metamathematics) trying to push back on this , so I’m treasuring this opportunity. (Also, it occurs to me that we might be in a bubble and plenty of people outside LW with the prerequisites do not share our views about this. Do you have any observations related to this?)
Just wanted to let everyone know I now wield a +307 strong upvote thanks to my elite ‘hacking’ skills. The rationalist community remains safe, because I choose to use this power responsibly.
As an unrelated inquiry, is anyone aware of some “karma injustices” that need to be corrected?
Do you think a superintelligence will be able to completely rule out the hypothesis that our universe literally is a dovetailing program that runs every possible TM, or literally is a bank of UTMs running every possible program (e.g., by reproducing every time step and adding 0 or 1 to each input tape)? (Or the many other hypothetical universes that similarly contain a whole Level-4-like multiverse?) It seems to me that hypotheses like these will always collectively have a non-negligible weight, and have to be considered when making decisions.
Another argument that seems convincing to me is that if only one universe exists, how to explain that it seems fine-tuned for being able to evolve intelligent life? Was it just some kind of metaphysical luck?
Also, can you try to explain your strong suspicion that only one universe exists (and is not the kind that contains a L4 multiverse)? In other words, do you just find the arguments for L4 unconvincing and defaulting to some unexplainable intuition, or have arguments to support your own position?
At this point, someone sufficiently MIRI-brained might start to think about (something equivalent to) Tegmark’s level 4 mathematical multiverse, where such agents might theoretically outperform others. Personally, I see no direct reason to believe in the mathematical multiverse as a real object, and I think this might be a case of the mind projection fallacy—computational multiverses are something that agents reason about in order to succeed in the real universe[3]. Even if a mathematical multiverse does exist (I can’t rule it out) and we can somehow learn about its structure[4], I am not sure that any effective, tractable agents can reason about or form preferences over it—and if they do, they should be locally out-competed by agents that only care about our universe, which means those are probably the ones we should worry about. My cruxiest objection is the first, but I think all of them are fairly valid.
I don’t want to defend UDT overall (see here for my current position on it), but I think Tegmark Level 4 is a powerful motivation for UDT or something like it even if you’re not very sure about it being real.
Since we can’t rule out the mathematical multiverse being a real object with high confidence, or otherwise being a thing that we can care about, we have to assign positive, non-negligible credence to this possibility.
If it is real or something we can care about, then given our current profound normative uncertainty we also have to assign positive, non-negligible credence to the possibility that we should care about the entire multiverse, and not just our local environment or universe. (There are some arguments for this, such as arguments for broadening our circle of concern in general.)
If we can’t strongly conclude that we should neglect the possibility that we can and should care about something like Tegmark Level 4, then we have to work out how to care about it or how to take it into account when we make decisions that can affect “distant” parts of the multiverse, so that such conclusions could be further fed into whatever mechanism we use to handle moral/normative uncertainty (such as Bostrom and Ord’s Moral Parliament idea).
As for “direct reason”, I think AIT played a big role for me, in that the algorithmic complexity (or rather, some generalization of algorithmic complexity to possibly uncomputable universes/mathematical objects) of Tegmark 4 as a whole is much lower than that of any specific universe within it like our apparent universe. (This is similar to the fact that the program tape for a UTM can be shorter than that of any non-UTM, as it can just be the empty string, or that you can print a history of all computable universes with a dovetailing program, which is very short.) Therefore it seems simpler to assume that all of Tegmark 4 exists rather than only some specific universe.
My objection to this argument is that it not only assumes that Predictoria accepts it is plausibly being simulated by Adversaria, which seems like a pure complexity penalty over the baseline physics it would infer otherwise unless that helps to explain observations,
Let’s assume for simplicity that both Predictoria and Adversaria are deterministic and nonbranching universes with the same laws of physics but potentially different starting conditions. Adversaria has colonized its universe and can run a trillion simulations of Predictoria in parallel. Again for simplicity let’s assume that each of these simulations is done as something like a full-scale physical reconstruction of Predictoria but with hidden nanobots capable of influencing crucial events. Then each of these simulations should carry roughly the same weight in M as the real Predictoria and does not carry a significant complexity penalty over it. That’s because the complexity / length of the shortest program for the real Predictoria, which consists of its laws of physics (
P
) and starting conditions (ICs_P
) plus a pointer to Predictoria the planet (Ptr_P
), isK(P) + K(ICs_P|P) + K(Ptr_P|...)
. The shortest program for one of the simulations consists of the same laws of physics (P
), Adversaria’s starting conditions (ICs_A
), plus a pointer to the simulation within its universe (Ptr_Sim
), with lengthK(P) + K(ICs_A|P) + K(Ptr_Sim|...)
. Crucially, this near-equal complexity relies on the idea that the intricate setup of Adversaria (including its simulation technology and intervention capabilities) arises naturally from evolvingICs_A
forward usingP
, rather than needing explicit description.(To address a potential objection, we also need that the combined weights (algorithmic probability) of Adversaria-like civilizations is not much less than the combined weights of Predictoria-like civilizations, which requires assuming that phenomenon of advanced civilizations running such simulations is a convergent outcome. That is, it assumes that once civilization reaches Predictoria-like stage of development, it is fairly likely to subsequently become Adversaria-like in developing such simulation technology and wanting to use it in this way. There can be a complexity penalty from some civilizations choosing or forced not to go down this path, but that would be more than made up for by the sheer number of simulations each Adversaria-like civilization can produce.)
If you agree with the above, then at any given moment, simulations of Predictoria overwhelm the actual Predictoria as far as their relative weights for making predictions based on M. Predictoria should be predicting constant departures from its baseline physics, perhaps in many different directions due to different simulators, but Predictoria would be highly motivated to reason about the distribution of these vectors of change instead of assuming that they cancel each other out. One important (perhaps novel?) consideration here is that Adversaria and other simulators can stop each simulation after the point of departure/intervention has passed for a while, and reuse the computational resources on a new simulation rebased on the actual Predictoria that has observed no intervention (or rather rebased on an untouched simulation of it), so the combined weight of simulations does not decrease relative to actual Predictoria in M even as time goes on and Predictoria makes more and more observations that do not depart from baseline physics.
When I talk about emotional health a lot of what I mean is finding ways to become less status-oriented (or, in your own words, “not being distracted/influenced by competing motivations”).
To clarify this as well, when I said (or implied) that Eliezer was “distracted/influenced by competing motivations” I didn’t mean that he was too status-oriented (I think I’m probably just as status-oriented as him), but rather that he wasn’t just playing the status game which rewards careful philosophical reasoning, but also a game that rewards being heroic and saving (or appearing/attempting to save) the world.
I’ve now read/skimmed your Replacing Fear sequence, but I’m pretty skeptical that becoming less status-oriented is both possible and a good idea. It seems like the only example you gave in the sequence is yourself, and you didn’t really talk about whether/how you became less status-oriented? (E.g., can this be observed externally?) And making a lot of people care less about status could have negative unintentional consequences, as people being concerned about status seems to be a major pillar of how human morality currently works and how our society is held together.
upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you’re talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference?
World worse than it could be:
social darwinism
various revolutions driven by flawed ideologies, e.g., Sun Yat-sen’s attempt to switch China from a monarchy to a democratic republic overnight with virtually no cultural/educational foundation or preparation, leading to governance failures and later communist takeover (see below for a more detailed explanation of this)
AI labs trying to save the world by racing with each other
World better than it could be:
invention/propagation of the concept of naturalistic fallacy, tempering a lot of bad moral philosophies
moral/normative uncertainty and complexity of value being fairly well known, including among AI researchers, such that we rarely see proposals to imbue AI with the one true morality nowadays
<details> The Enlightenment’s Flawed Reasoning and its Negative Consequences (written by Gemini 2.5 Pro under my direction)
While often lauded, the Enlightenment shouldn’t automatically be classified as a triumph of “careful philosophical reasoning,” particularly concerning its foundational concept of “natural rights.” The core argument against its “carefulness” rests on several points:
-
Philosophically “Hand-Wavy” Concept of Natural Rights: The idea that rights are “natural,” “self-evident,” or inherent in a “state of nature” lacks rigorous philosophical grounding. Attempts to justify them relied on vague appeals to God, an ill-defined “Nature,” or intuition, rather than robust, universally compelling reasoning. It avoids the hard work of justifying why certain entitlements should exist and be protected, famously leading critics like Bentham to dismiss them as “nonsense upon stilts.”
-
Superficial Understanding Leading to Flawed Implementation: This lack of careful philosophical grounding wasn’t just an academic issue. It fostered a potentially superficial understanding of what rights are and what is required to make them real. Instead of seeing rights as complex, practical social and political achievements that require deep institutional infrastructure (rule of law, independent courts, enforcement mechanisms) and specific cultural norms (tolerance, civic virtue, respect for process), the “natural rights” framing could suggest they merely need to be declared or recognized to exist.
-
Case Study: China’s Premature Turn to Democracy: The negative consequences of this superficial understanding can be illustrated by the attempt to rapidly transition China from monarchy to a democratic republic in the early 20th century.
Influenced by Enlightenment ideals, reformers and revolutionaries like Sun Yat-sen adopted the forms of Western republicanism and rights-based governance.
However, the prevailing ideology, arguably built on this less-than-careful philosophy, underestimated the immense practical difficulty and the necessary prerequisites for such a system to function, especially in China’s context.
If Chinese intellectuals and leaders had instead operated from a more careful, practical philosophical understanding – viewing rights not as “natural” but as outcomes needing to be carefully constructed and secured through institutions and cultural development – they might have pursued different strategies.
Specifically, they might have favored gradualism, supporting constitutional reforms under the weakening Qing dynasty or working with reform-minded officials and strongmen like Yuan Shikai to build the necessary political and cultural infrastructure over time. This could have involved strengthening proto-parliamentary bodies, legal systems, and civic education incrementally.
Instead, the revolutionary fervor, fueled in part by the appealing but ultimately less “careful” ideology of inherent rights and immediate republicanism, pushed for a radical break. This premature adoption of democratic forms without the functional substance contributed significantly to the collapse of central authority, the chaos of the Warlord Era, and ultimately created conditions ripe for the rise of the Communist Party, leading the country down a very different and tragic path.
In Conclusion: This perspective argues that the Enlightenment, despite its positive contributions, contained significant philosophical weaknesses, particularly in its conception of rights. This lack of “carefulness” wasn’t benign; it fostered an incomplete understanding that, when adopted by influential actors facing complex political realities like those in early 20th-century China, contributed to disastrous strategic choices and ultimately made the world worse than it might have been had a more pragmatically grounded philosophy prevailed. It underscores how the quality and depth of philosophical reasoning can have profound real-world consequences. </details>
So I basically get the sense that the role of careful thinking in your worldview is something like “the thing that I, Wei Dai, ascribe my success to”. And I do agree that you’ve been very successful in a bunch of intellectual endeavours. But I expect that your “secret sauce” is a confluence of a bunch of factors (including IQ, emotional temperament, background knowledge, etc) only one of which was “being in a community that prioritized careful thinking”.
This seems fair, and I guess from this perspective my response is that I’m not sure how to intervene on the other factors (aside from enhancing human IQ, which I do support). It seems like your view is that emotional temperament is also a good place to intervene? If so, perhaps I should read your posts with this in mind. (I previously didn’t see how the Replacing Fear sequence was relevant to my concerns, and mostly skipped it.)
And then I also think you’re missing a bunch of other secret sauces that would make your impact on the world better (like more ability to export your ideas to other people).
I’m actually reluctant to export my ideas to more people, especially those who don’t care as much about careful reasoning (which unfortunately is almost everyone), as I don’t want to be responsible for people misusing my ideas, e.g., overconfidently putting them into practice or extending them in wrong directions.
However I’m trying to practice some skills related to exporting ideas (such as talking to people in real time and participating on X) in case it does seem to be a good idea one day. Would be interested to hear more about what other secret sauces related to this I might be missing. (I guess public speaking is another one, but the cost of practicing that one is too high for me.)
One reason I’m personally pushing back on this, btw, is that my own self-narrative for why I’m able to be intellectually productive in significant part relies on me being less intellectually careful than other people—so that I’m willing to throw out a bunch of ideas that are half-formed and non-rigorous, iterate, and eventually get to the better ones.
To be clear, I think this is totally fine, as long as you take care to not be or appear too confident about these half-formed ideas, and take precautions against other people taking your ideas more seriously than they should (such as by monitoring subsequent discussions and weighing in against other people’s over-enthusiasm). I think “careful thinking” can and should be a social activity, which would necessitate communicating half-formed ideas during the collaborative process. I’ve done this myself plenty of times, such as in my initial UDT post, which was very informal and failed to anticipate many subsequently discovered problems, so I’m rather surprised that you think I would be against this.
This is part of why I’m less sold on “careful philosophical reasoning” as the key thing. Indeed, wanting to “commit prematurely to a specific, detailed value system” is historically very correlated with intellectualism (e.g. elites tend to be the rabid believers in communism, libertarianism, religion, etc—a lot of more “normal” people don’t take it that seriously even when they’re nominally on board). And so it’s very plausible that the thing we want is less philosophy, because (like, say, asteroid redirection technology) the risks outweigh the benefits.
Here, you seem to conflate “careful philosophical reasoning” with intellectualism and philosophy in general. But in an earlier comment, I tried to draw a distinction between careful philosophical reasoning and the kind of hand-wavy thinking that has been called “philosophy” in most times and places. You didn’t respond to it in that thread… did you perhaps miss it?
More substantively, Eliezer talked about the Valley of Bad Rationality, and I think there’s probably something like that for philosophical thinking as well, which I admit definitely complicates the problem. I’m not going around and trying to push random people “into philosophy”, for example.
If you take your interim strategy seriously (but set aside x-risk) then I think you actually end up with something pretty similar to the main priorities of classic liberals: prevent global lock-in (by opposing expansionist powers like the Nazis), prevent domestic political lock-in (via upholding democracy), prevent ideological lock-in (via supporting free speech), give our descendants more optionality (via economic and technological growth). I don’t think this is a coincidence—it just often turns out that there are a bunch of heuristics that are really robustly good, and you can converge on them from many different directions.
Sure, there’s some overlap on things like free speech and preventing lock-in. But calling it convergence feels like a stretch. One of my top priorities is encouraging more people to base their moral evolution on careful philosophical reasoning instead of random status games. That’s pretty different from standard classical liberalism. Doesn’t this big difference suggest the other overlaps might just be coincidence? Have you explained your reasons anywhere for thinking it’s not a coincidence and that these heuristics are robust enough on their own, without grounding in some explicit principle like “normative option value” that could be used to flexibly adjust the heuristics according to the specific circumstances?
Yes, but also: it’s very plausible to me that the net effect of LessWrong-inspired thinking on AI x-risk has been and continues to be negative.
I think this is plausible too, but want to attribute it mostly to insufficiently careful thinking and playing other status games. I feel like with careful enough thinking and not being distracted/influenced by competing motivations, a lot of the negative effects could have been foreseen and prevented. For example, did you know that Eliezer/MIRI for years pursued a plan of racing to build the first AGI and making it aligned (Friendly), which I think inspired/contributed (via the founding of DeepMind) to the current crop of AI labs and their AI race, and that I had warned him at the time (in a LW post or comment) that the plan was very unlikely to succeed and would probably backfire this way?
Also, I would attribute Sam and Elon’s behavior not to mental health issues, but to (successfully) playing their own power/status game, with “not trusting Google / each other” just a cover for wanting to be the hero that saves the world, which in turn is just a cover for grabbing power and status. This seems perfectly reasonable and parsimonious from an evolutionary psychology perspective, and I don’t see why we need to hypothesize mental health issues to explain what they did.
Ok, I see where you’re coming from, but think you’re being overconfident about non-cognitivism. My current position is that non-cognitivism is plausible, but we can’t be very sure that it is true, and making progress on this meta-ethical question also requires careful philosophical reasoning. These two posts of mine are relevant on this topic: Six Plausible Meta-Ethical Alternatives , Some Thoughts on Metaphilosophy
None of these seem as crucial as careful philosophical reasoning, because moral progress is currently not bottlenecked on any of them (except possibly the last item, which I do not know the contents of). To explain more, I think the strongest conclusion from careful philosophical reasoning so far is that we are still very far from knowing what normativity (decision theory and values, or more generally rationality and morality) consists of, and therefore the most important thing right now is to accumulate and preserve normative option value (the ability to eventually do the best thing with the most resources).
What is blocking this “interim morality” from being more broadly accepted? I don’t think it’s lack of either political activism (plenty of people in free societies also don’t care about preserving normative option value), neuroscience/psychology (how would it help at this point?), or introspection + emotional health (same question, how would it help?), but just that the vast majority of people do not care about trying to figure out normativity via careful philosophical reasoning, and instead are playing status games with other focal points.
<details>
<summary>Here’s a longer, more complete version of my argument, written by Gemini 2.5 Pro after some back and forth. Please feel free to read or ignore (if my own writing above seems clear enough).</summary>
Goal: The ultimate aim is moral progress, which requires understanding and implementing correct normativity (how to decide, what to value).
Primary Tool: The most fundamental tool we have for figuring out normativity at its roots is careful, skeptical philosophical reasoning. Empirical methods (like neuroscience) can inform this, but the core questions (what should be, what constitutes a good reason) are philosophical.
Current Philosophical State: The most robust conclusion from applying this tool carefully so far is that we are deeply uncertain about the content of correct normativity. We haven’t converged on a satisfactory theory of value or decision theory. Many plausible-seeming avenues have deep problems.
Rational Response to Uncertainty & Its Urgent Implication:
Principle: In the face of such profound, foundational uncertainty, the most rational interim strategy isn’t to commit prematurely to a specific, detailed value system (which is likely wrong), but to preserve and enhance optionality. This means acting in ways that maximize the chances that whatever the correct normative theory turns out to be, we (or our successors) will be in the best possible position (knowledge, resources, freedom of action) to understand and implement it. This is the “preserve normative option value” principle.
Urgent Application: Critically, the most significant threats to preserving this option value today are existential risks (e.g., from unaligned AI, pandemics, nuclear war) which could permanently foreclose any desirable future. Therefore, a major, urgent practical consequence of accepting the principle of normative option value is the prioritization of mitigating these existential risks.
The Current Bottleneck: Moral progress on the most critical front is primarily stalled because this philosophical conclusion (deep uncertainty) and its strategic implication (preserve option value)—especially its urgent consequence demanding the prioritization of x-risk mitigation—are not widely recognized, accepted, or acted upon with sufficient seriousness or resources.
Why Other Factors Aren’t the Primary Strategic Bottleneck Now:
Politics: Free societies exist where discussion could happen, yet this conclusion isn’t widely adopted within them. The bottleneck isn’t solely the lack of freedom, but the lack of focus on this specific line of reasoning and its implications.
Neuroscience/Psychology: While useful eventually, understanding the brain’s mechanisms doesn’t currently resolve the core philosophical uncertainty or directly compel the strategic focus on option value / x-risk. The relevant insight is primarily conceptual/philosophical at this stage.
Introspection/Emotional Health: While helpful, the lack of focus on option value / x-risk isn’t plausibly primarily caused by a global deficit in emotional health preventing people from grasping the concept. It’s more likely due to lack of engagement with the specific philosophical arguments, different priorities, and incentive structures.
Directness: Furthermore, addressing the conceptual bottleneck around option value and its link to x-risk seems like a more direct path to potentially shifting priorities towards mitigating the most pressing dangers quickly, compared to the slower, more systemic improvements involved in fixing politics, cognition, or widespread emotional health.
</details>
Edit: Hmm, <details> doesn’t seem to work in Markdown and I don’t know how else to write collapsible sections in Markdown, and I can’t copy/paste the AI content correctly in Docs mode. Guess I’ll leave it like this for now until the LW team fixes things.
This sounds interesting. I would be interested in more details and some sample outputs.
What do you use this for, and how?