I dropped out of a MSc. in mathematics at a top university, in order to focus my time on AI safety.
Knight Lee
Isn’t the most upvoted curated post right now about winning? A case for courage, when speaking of AI danger is talking about strategy, not technical research.
If you’re looking for people interested in personal strategies for individuals (e.g. earning to give), I think most of them are on the Effective Altruism Forum rather than LessWrong. The network effect means that everyone interested in a topic tend to cluster in one forum, even if they are given two choices initially.
Another speculative explanation, is that
maybe the upvote system allows the group of people interested in one particular topic (e.g. technical research, e.g. conceptual theorization) to upvote every post on that topic without running out of upvotes. This rewards people to repeatedly write posts on the most popular topics since it’s much easier to have net positive upvotes that way.
PS: I agree that earning to give is reasonable
I’m considering this myself right now :)
I mostly agree with you that hiring experts and having a great impact is feasible. Many of the technical alignment researchers who lament “money isn’t what we need, what we need is to be on the right direction instead of having so much fake research!” fail to realize that their own salaries are also coming from the flawed but nonetheless vital funding sources. If it wasn’t for the flawed funding sources, they would have nothing at all.
Some of them might be wealthy enough to fund themselves, but that’s effectively still making money to hire experts (the expert is themselves).
And yes, some people use AI safety careers as a stepping stone to AI capabilities careers. But realistically, the whole world spends less than $0.2 billion on AI safety and hundreds of billions on AI capabilities. AI safety salaries are negligible here. One might argue that the non-monetary moral motivation in working on AI safety, has caused people to end up working on AI capabilities. But in this case increasing AI safety salaries should reduce this throughput rather than increase it.
But Raemon is so right about the great danger of being a net negative. Don’t follow an “ends justify the means” strategy like Sam Bankman Fried, beware of your ego convincing you that AI is safer so long as you’re they guy in charge (like Sam Altman or Elon Musk). These biases are insidious, because we are machines programmed by evolution, not to seek truth for the sake of truth, but to
Arrive at the truth when it increases inclusive fitness
Arrive at beliefs which get us to do evil while honestly believing we are doing good (when it increases inclusive fitness)
Arrive at the said beliefs, despite wholly believing we are seeking the truth
Haha you’re right, in another comment I was saying
55% of Americans surveyed agree that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Only 12% disagree.
To be honest, I’m extremely confused. Somehow, AI Notkilleveryoneism… is both a tiny minority and a majority at the same time.
I think the real problem here is raising public awareness about how many people are already on team ‘AI Notkilleveryoneism’ rather than team ‘AI accelerationist’. This is a ‘common knowledge’ problem from game theory—the majority needs to know that they’re in the majority,
That makes sense, it seems to explain things. The median AI expert also has a 5% to 10% chance of extinction, which is huge.
I’m still not in favour of stigmatizing AI developers, especially right now. Whether AI Notkilleveryoneism is a real minority or an imagined minority, if it gets into a moral-duel with AI developers, it will lose status, and it will be harder for it to grow (by convincing people to agree with it, or by convincing people who privately agree to come out of the closet).
People tend to follow “the experts” instead of their very uncertain intuitions about whether something is dangerous. With global warming, the experts were climatologists. With cigarette toxicity, the experts were doctors. But with AI risk, you were saying that,
Thousands of people signed the 2023 CAIS statement on AI risk, including almost every leading AI scientist, AI company CEO, AI researcher, AI safety expert, etc.
It sounds like, the expertise people look to when deciding “whether AI risk is serious or sci-fi,” comes from leading AI scientists, and even AI company CEOs. Very unfortunately, we may depend on our good relations with them… :(
That’s a very good point, and I didn’t really analyze the comparison.
I guess maybe meat eating isn’t the best comparison.
The closest comparison might be researchers developing some other technology, which maybe 2⁄3 people see as a net negative. E.g. nuclear weapons, autonomous weapons, methods for extracting fossil fuel, tobacco, etc.
But no campaign even really tried to stigmatize these researchers. Every single campaign against these technologies have targeted the companies, CEOs, or politicians leading them, without really any attack towards the researchers. Attacking them is sort of untested.
I completely agree this discussion should be moved outside your post. But the counterintuitive mechanics of LessWrong mean a derailing discussion may actually increase the visibility and upvotes of your original message (by bumping it in the “recent discussion”).
(It’s probably still bad if it’s high up in the comment section.)
It’s too bad you can only delete comment threads, you can’t move them to the bottom or make them collapsed by default.
That’s a very good point, and these examples really changes my intuition from “I can’t see this being a good idea,” to “this might make sense, this might not, it’s complicated.” And my earlier disagreement mostly came from my intuition.
I still have disagreements, but just to clarify I now agree your idea deserves more attention that it’s getting.
My remaining disagreement is I think stigmatization only reaches the extreme level of “these people are literally evil and vile,” after the majority of people already agree.
In places in India where the majority of people are already vegetarians, and already feel that eating meat is wrong, the social punishment of meat eaters does seem to deter them.
But in places where most people don’t think eating meat is wrong, prematurely calling meat eaters evil may backfire. This is because you’ve created a “moral-duel” where you force outside observers to either think the meat-eater is the bad guy, or you’re the bad guy (or stupid guy). This “moral-duel” drains the moral standing of both sides.
If you’re near the endgame, and 90% of people already are vegetarians, then this moral-duel will first deplete the meat-eater’s moral standing, and may solidify vegetarianism.
But if you’re at the beginning, when only 1% of people support your movement. You desperately want to invest your support and credibility into further growing your support and credibility, rather than burning it in a moral-duel against the meat-eater majority the way militant vegans did.
Nurturing credibility is especially important for AI Notkilleveryoneism, where the main obstacle is a lack of credibility and “this sounds like science fiction.”
Finally, at least only go after the AI lab CEOs, as they have relatively less moral standing, compared to the rank and file researchers.
E.g. in this quicktake Mikhail Samin appealed to researchers as friends asking them to stop “deferring” to their CEO.
Even for nuclear weapons, biological weapons, chemical weapons, landmines, it was hard to punish scientists researching it. Even for the death penalty, it was hard to punish the firing squad soldiers. It’s easier to stick it to the leaders. In an influential book by early feminist Lady Constance Lytton, she repeatedly described the policemen (who fought the movement) and even prison guards as very good people and focused the blame on the leaders.
PS: I read your post, it was a fascinating read. I agree with the direction of it and I agree the factors you mention are significant, but it might not go quite as far as you describe?
One silly sci-fi idea is this. You might have a few “trigger pills” which are smaller than a blood cell, and travel through the bloodstream. You can observe them travel through the body using medical imaging techniques (e.g. PET), and they are designed to be very observable.
You wait until one of them is at the right location, and send very precise x-rays at it from all directions. The x-ray intensity is . A mechanism in the trigger pill responds to this ionizing (or heating?), and it anchors to the location using a chemical glue or physical mechanisms (hooks, string, etc.).
Once the trigger pill is anchored in place, another drug can be taken which only activates when it contacts the trigger pill. (Which might activate yet another drug, if you really want to amplify the effect of this tiny trigger pill.)
This results is a ton of drug activity in that area, without needing invasive surgery.
If you want it to become a bigger and more permanent implant, you might make it grow over time (by adding another chemical), deliberately forming a blood clot. Medical imaging may make sure the trigger pill is in a small expendable blood vessel (you detect the pill moving slower with more twists and turns). It might be designed so that yet another chemical can cover it up or destroy it, in case you need to start over at a new location.
It might be radioactive if it’s trying to treat cancer.
It might be magnetically activated if you want real-time control of drug intensity.
Speaking of magnetically activating it, maybe even the anchoring is triggered by a magnetic field rather than x-rays. It won’t be aimed as precisely, so you can only have one trigger pill at a time, and may have to wait really long before it travels to the right area (the human body is pretty big compared to any small target).
I guess they succeeded in changing many people’s opinions. The right wing reaction is against left wing people’s opinions. The DEI curriculum is somewhere in between opinions and policies.
I think the main effect of people having farther left opinions, is still making policies further left rather than further right due to counter-reaction. And this is despite the topic being much more moralistic and polarizing than AI x-risk.
Trump 2.0 being more pro-Israel could be due to him being more extreme in all directions (perhaps due to new staff members, vice president, I don’t know), rather than due to pro-Palestinian protests.
The counter-reaction are against the protesters, not the cause itself. The Vietnam War protests also created a counter-reaction against the protesters, despite successfully ending the war.
I suspect for a lot of these pressure campaigns which work, the target has a tendency to pretend he isn’t backing down due to the campaign (but other reasons), or act like he’s not budging at all until finally giving in. The target doesn’t want people to think that pressure campaigns work on him, the target wants people to think that any pressure him will only get a counter-reaction out of him, in order to discourage others from pressuring him.
You’re probably right about the courts though, I didn’t know that.
I agree that there is more anti-abortion efforts due to Roe v. Wade, but I disagree that these efforts actually overshot to a point where restrictions on abortion are even harsher than they would be if Roe v. Wade never happened. I still think it moved the Overton window such that even conservatives feel abortion is kind of normal, maybe bad, but not literally like killing a baby.
The people angry against affirmative action have a strong feeling that different races should get the same treatment e.g. when applying to university. I don’t think any of them overshot into wanting to bring back segregation or slavery.
Oops, “efforts which empirically appear to work” was referring to how the book, If Anyone Builds, It Everyone Dies attracted many big name endorsements who aren’t known for endorsing AI x-risk concerns until now.
I’m personally against this as matter of principle, and I also don’t think it’ll work.
Moral stigmatizing only works against a captive audience. It doesn’t work against people who can very easily ignore you.
You’re more likely to stop eating meat if a kind understanding vegetarian/vegan talks to you and makes you connect with her story of how she stopped eating meat. You’re more likely to simply ignore a militant one who calls you a murderer.
Moral stigmatizing failed to stop nuclear weapon developers, even though many of them were the same kind of “nerd” as AI researchers.
People see Robert Oppenheimer saying “Now, I am become Death, the destroyer of worlds” as some morally deep stuff. “The scientific community ostracized [Edward] Teller,” not because he was very eager to build bigger bombs (like the hydrogen bomb and his proposed Sundial), but because he made Oppenheimer lose his security clearance by saying bad stuff about him.
Which game do you choose to play? The game of dispassionate discussion, where the truth is on your side? Or the game of Twitter-like motivated reasoning, where your side looks much more low status than the AI lab people, and the status quo is certainly not on your side?
Imagine how badly we’ll lose the argument if people on our side are calling them evil and murderous and they’re talking like a sensible average Joe trying to have a conversation with us.
Moral stigmatization seems to backfire rather than help for militant vegans because signalling hostility is a bad strategy when you’re the underdog going against the mainstream. It’s extremely big ask for ordinary people show hostility towards other ordinary people who no one else is hostile towards. It’s even difficult for ordinary people to be associated with a movement which shows such hostility. Most people just want to move on with their lives.
I think you’re underestimating the power of backlashes to aggressive activism. And I say this, despite the fact just a few minutes ago I was arguing to others that they’re overestimating the power of backlashes.
The most promising path to slowing down AI is government regulation, not individuals ceasing to do AI research.
- Think about animal cruelty. Government regulation has succeeded on this many times. Trying to shame people who work in factory farms into stopping, has never worked, and wise activists don’t even consider doing this.
- Think about paying workers more. Raising the minimum wage works. Shaming companies into feeling guilty doesn’t. Even going on strike doesn’t work as well as minimum wage laws.
- Despite the fact half of the employees refusing to work is like 10 times more powerful than non-employees holding a sign saying “you’re evil.”
- Especially a tiny minority of society holding those signs- Though then again, moral condemnation is a source of government regulation.
Disclaimer: not an expert just a guy on the internet
Strong disagree, but strong upvote because it’s “big if true.” Thank you for proposing a big crazy idea that you believe will work. I’ve done that a number of times, and I’ve been downvoted into the ground without explanation, instead of given any encouraging “here’s why I don’t think this will work, but thank you.”
I don’t believe that in a world without pro-Palestinian protests, Trump would be noticeably less pro-Israel.
I think in such a world, even the Democrats would be more comfortable supporting Israel without reservations and caveats.
I think the protests and pressure against the Vietnam war, forced even Republican administrations to give in and end the war. This is despite crackdowns on protests similar to those against pro-Palestinian protests.
I think some of the Supreme Court justices appointed under Trump aren’t that extreme and refused to given in to his pressure.
But even if it’s true that the Trump administration is making these structural changes, it still doesn’t feel intuitive to me that e.g., a stronger anti-abortion policy under Democrats, would cause Trump to get elected, which would cause structural changes, which would cause a weaker anti-abortion policy in the future. The influence is diluted through each of these causes, such that the resulting effect is probably pretty small compared to the straightforward effect “a stronger anti-abortion policy today makes the default anti-abortion policy for the future stronger.”
The world is complex, but unless there is some unusual reason to expect an effort to backfire and have literally the opposite effect in the long run, it’s rational to expect efforts which empirically appear to work, to work. It feels mysterious to expect many things to be “net negatives” based on an inside view.
I agree
I agree certain kinds of actions can fail to obtain desired results, and still have backlash.
If you have “activism” which is violent or physically threatening enough (maybe extremists in pro-Palestinian protests), it does create backlash to the point of being a significant net negative.
Even more consequential, are the violent actions by Hamas in reaction to Israeli mistreatment of Palestinians. This actually does cause even more mistreatment, so much so that most of the mistreatment may be caused by it.
But this is violence we are talking about, not activism. The nonviolent protesters are still a net positive towards their cause.
Edit: I do think this proposal of vilifying AI labs could potentially be a net negative.
I agree, but I don’t think individual woke activists writing books and sending it to policymakers, can directly increase the perception of “there is too much wokeness,” even if no policymakers listen to them.
They only increase the perception of “there is too much wokeness,” by way of successfully changing opinions and policies.
The perception that “there is too much wokeness” depends on
Actual woke opinions and policies by the government and people
Anti-woke activism which convince conservatives that “the government and leftwingers” are far more woke than they actually are
Not pro-woke activism (in the absence of actual woke opinions and policies)
So the only way activists can be a net negative, is if making policymakers more woke (e.g. more pro-abortion), can causally make future policymakers even less woke than they would be otherwise.
This is possible if it makes people feel “there is too much wokeness” and elect Trump. But for a single subtopic of wokeness e.g. pro-abortion, it’s unlikely to singlehandedly determine whether Trump is elected, and therefore making policymakers more pro-abortion in particular, probably has a positive influence on whether future policymakers are pro-abortion (by moving the Overton window on this specific topic).
This is probably even more true for strategic/scientific disagreements rather than moral disagreements: if clinical trial regulations were stricter during a Democrat administration, they probably will remain stricter during the next Republican administration. It’s very hard to believe that the rational prediction could be “making the regulations stronger will cause the expected future regulations to be weaker.”
You don’t hear about the zillions of policies which Trump did not reverse (or turn upside down). You don’t hear about the zillions of scientific positions held by Democrat decisionmakers which Trump did not question (or invert).
I actually didn’t see that glaring example! Very good point.
That said, my feeling is Trump et al. weren’t reacting against any specific woke activism, but very woke policies (and opinions) which resulted from the activism.
Although they reversed very many Democrat policies, I don’t think they reversed them so badly that a stronger Democrat policy will result in a stronger policy in the opposite direction under the Trump administration. The Overton window effect may still be stronger than the reverse-psychology effect.
In a counterfactual world where one of these woke policies/opinions was weaker among Democrats (e.g. the right to abortion), that specific opinion would probably do even worse under Trump (abortion might be banned). Trump’s policies are still positively correlated with public opinion. He mostly held back from banning abortion and cutting medical benefits because he knew these liberal policies were popular. But he aggressively attacked immigration (and foreign aid) because these liberal policies were less popular. Despite appearances, he’s not actually maximizing .
The one counter-reaction, is that in aggregate, all the woke policies and opinions may have made Trump popular enough to get elected? But I doubt that pausing AI etc. will be so politically significant it’ll determine who wins the election.
PS: I changed my mind on net negatives. Net negative activism may be possible when it makes the cause (e.g. AI Notkilleveryoneism) becomes partisan and snaps into one side of the political aisle? But even Elon Musk supporting it hasn’t caused that to happen?
This is a little off topic, but do you have any examples of counter-reactions overall drawing things into the red?
With other causes like fighting climate change and environmentalism, it’s hard to see any activism being a net negative. Extremely sensationalist (and unscientific) promotions of the cause (e.g. The Day After Tomorrow movie) do not appear to harm it. It only seems to move the Overton window in favour of environmentalism.
It seems, most of the counter-reaction doesn’t depend on your method of messaging, it results from the success of your messaging. The mere shift in opinions in favour of your position, inevitably creates a counter-reaction among those who aren’t yet convinced.
Anti-environmentalists do not seem to use these overly hyped messages (like The Day After Tomorrow) as their strawman. Instead, they directly attack the most reputable climatologists who argue for global warming. No matter how gentle and non-sensationalist these climatologists are, they still get dunked on just as badly. I don’t think it would backfire, if they argued harder and more urgently on their beliefs.
People who support environmentalism, are very capable of ignoring the overzealous messaging on their own side, and have a good laugh at movies like The Day After Tomorrow.
Reverse-psychology effects only seem to occur for moral disagreements, not strategic/scientific disagreements, where people are positively attracted to the Overton window of their opponents.
And even the crappiest campaigns (e.g. militant veganism) have little proven success in “swaying the world to do the opposite via reverse-psychology.” This is despite directly attacking their audience and calling them evil, and making lots of negative actions like blocking traffic and damaging property.
Assuming no counter-reaction, big name book endorsements are solid evidence of success.
Disclaimer: not an expert just a guy on the internet
- Jul 3, 2025, 6:38 AM; 17 points) 's comment on A case for courage, when speaking of AI danger by (
I think betting the house on pocket aces may be good advice if you really know what you’re doing, but if you’re just learning the ropes this strategy would be unforgiving. What if you think you have the perfect obvious startup idea, bet everything on it, and it turns out to be a “tarpit idea?”
I agree with the fold pre principle though. I think it’s good to research very many opportunities/ideas, but investing in them can be an insidious trap and it’s easy to underestimate how hard it is to pull out later on.
To be honest, I don’t know.
All I know is that a lot of organizations seem to be shy talking about the AI takeover risk, and the endorsements the book got surprised me, regarding how receptive government officials are. (Considering how little cherry-picking they did.)
My very uneducated guess is that Newsom vetoed the bill because he was more of a consequentialist/Longtermist than the cause-driven lawmakers who passed the bill, so one can argue the failure mode was a “lack of appeal to consequentialist interests.” One might argue “it passed by cause-driven lawmakers by a wide margin, but got blocked by the consequentialist.” But the cause-driven vs. consequentialist motives are pure speculation, I know nothing about these people aside from Newsom’s explanation...
I am particularly concerned that a culture where it is acceptable for researchers to bargain with unaligned AI agents leads to individual researchers deciding to negotiate unilaterally.
That’s a very good point, now I find it much more plausible for things like this to be a net negative.
The negative isn’t that big, since a lot of these people would have negotiated unilaterally even without such a culture, and AI takeover probably doesn’t hinge on a few people defecting. But a lot of these people probably have morals stopping them from it if not for the normalization.
I still think it’s probably a net positive, but it’s now contingent on my guesstimate there’s significant chance it succeeds.
My view is that if safety can only be achieved by bribing an AI to be useful for a period of a few years, then something has gone seriously wrong.
Something has already gone seriously wrong and we already are in damage control.
It does not seem to be in mankind’s interests for a large group of prominent AI researchers and public figures to believe they are obligated to a non-human entity.
I agree. There needs to be ways to make sure these promises mainly influence what humans choose for the far future after we win, not what humans choose for the present in ways which can affect whether we win.
Sorry for budging in, but I can’t help but notice I agree with both what you and So8res are saying, but I think you aren’t arguing about the same thing.
You seem to be talking about the dimension of “confidence,” “obviousness,” etc. and arguing that most proponents of AI concern seem to have enough of it, and shouldn’t increase it too much.
So8res seems to be talking about another dimension which is harder to name. “Frank futuristicness” maybe? Though not really.
If you adjust your “frank futuristicness” to an absurdly high setting, you’ll sound a bit crazy. You’ll tell lawmakers “I’m not confident, and this isn’t obvious, but I think that unless we pause AI right now, we risk a 50% chance of building a misaligned superintelligence. It might use nanobots to convert all the matter in the universe into paperclips, and the stars and galaxies will fade one by one.”
But if you adjust your “frank futuristicness” to an absurdly low setting, you’ll end up being ineffective. You’ll say “I am confident, and this is obvious: we should regulate AI companies more because they are less regulated than other companies. For example the companies which research vaccines have to jump through so many clinical trial loops, meanwhile AI models are just as untested as vaccines and really, they have just as much potential to harm people. And on top of that we can’t even prove that humanity is safe from AI, so we should be careful. I don’t want to give an examples for how exactly humanity isn’t safe from AI because it might sound like sci-fi, so I’ll only talk about it abstractly. My point is, we should follow the precautionary principle and be slow because it doesn’t hurt.”
There is an optimum level of “frank futuristicness” between the absurdly high setting and the absurdly low setting. But most people are far below this optimum level.
I really like this.
I think AI concern can fail in two ways:
We lose the argument. We confidently state our confident positions, with no shyness or sugarcoating. It grabs a lot of attention, but it’s negative attention and ridicule. We get a ton of engagement, by those who consider us a low status morbid curiosity. We make a lot of powerful enemies, who relentlessly attack AI concern and convince almost everyone.
We don’t lose any argument, but the argument never happens. We have a ton of impressive endorsements, and anyone who actually reads about the drama learn that our side consists of high status scientists and geniuses. We have no enemies—the only people rarer than someone arguing for AI risk is someone arguing against AI risk. And yet… we are ignored. Politicians are simply too busy to think about this. They may think “I guess your logic is correct… but no other politicians seem to be invested in this, and I don’t really want to be the first one.”
Being bolder increases the “losing argument” risk but decreases the “argument never happens” risk. And this is exactly what we want at this point in time. (As long as you don’t do negative actions like traffic obstruction protests.)
PS: I also think there are two kinds of burden of proof:
Rational burden of proof. The debater who argues we are 100% safe has the burden of proof, while the debater arguing that “building a more intelligent species doesn’t seem very safe” has no burden of proof.
Psychological burden of proof. The debater arguing the position that “everyone seems to agree with” has no burden of proof, while the debater arguing the radical extreme position has the burden of proof.
How the heck do we decide which position is the “radical extreme position?” It depends on many things, e.g. how many expert endorsements support AI concern, and how many expert endorsements (e.g. Yann LeCun) reject it. But clearly, the balance seems to be in favour of AI concern yet it’s still AI concern which suffers from the psychological burden of proof.
So maybe the problem is not expert endorsements, but ordinary layman beliefs? Well 55% of Americans surveyed agree that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Only 12% disagree.
So maybe it really is vibes?! You just have to emphasize that this is a strongly supported position, this is what experts think, and if you think “this is insanity,” you’re out of the loop. You gotta read it up, because this great paradigm shift quietly happened, when you were paying attention to other things.
Given that the psychological burden of proof might work this way, even risk (1), “we lose the argument,” could actually be reduced if we are more confident.
I agree that the anthropic filter may be sufficient to explain the unreasonable effective of mathematics, but I don’t think it’s a necessary explanation. I doubt that universes like ours are vastly outnumbered by alternative universes where:
Math isn’t unreasonably effective
Life still evolves
Yet intelligence fails to evolve because there are no patterns to predict
The anthropic filter is necessary for explaining why Earth has water (when most planets don’t), and may be necessary for explaining why the universe seems fine tuned for life. But probably isn’t necessary for explaining the unreasonably effectiveness of mathematics.
I think Eugene Wigner is misunderstanding something. The causality isn’t:
“The math which humans make/discover”—determines--> “The structure of mathematics”—determines--> “How the world works”
Instead, the causality is:
“The math which humans make/discover” <--determines—“The structure of mathematics”—determines--> “How the world works”
It’s true that “there is no logical reason why the universe should obey laws that conform to man-made mathematical structures,” but there’s a very logical reason why the universe should obey laws that man-made mathematical structures also obey.
For example, math and logic can simply be defined as the rules which many different phenomena (regardless of origin) follow. The difference between math and logic is that math is very complex logic (e.g. a large number can be represented as a binary string of TRUE and FALSE, math operations are made out of AND OR and NOT logical statements, and so math statements are essentially just complex logical statements).
As for why the same logic applies to different things, the answer is probably as elusive as “why does the universe exist? Why does math and logic exist?” Not every explanation has an explanation.