Shortform

sarahconstantin 5 Nov 2025 19:06 UTC
2 points
0
links 11/05/25: https://roamresearch.com/#/app/srcpublic/page/11-05-2025
- https://www.thirdoikos.com/p/life-in-the-third-oikos-jesse-genet description of daily life for an entrepreneur turned homeschool mom, interview by Nicole Ruiz
- https://en.wikipedia.org/wiki/Chumash_people they’re still around!
- https://www.nytimes.com/2025/11/02/arts/television/maria-riva-dead.html?unlocked_article_code=1.yU8.uFZP.tjEhyXyNasNz&smid=url-share Maria Riva, Marlene Dietrich’s daughter, had a rough time growing up
- https://builders.genagorlin.com/p/the-hidden-belief-that-kills-great by [[Gena Gorlin]], if you don’t have high standards for employees it might because you’re misanthropic, aka you secretly believe “everyone sucks except me” and must be accommodated.
- https://transluce.org/ AI interpretability org
Zach Stein-Perlman 5 Nov 2025 19:02 UTC
2 points
0
Maybe the logistic success curve should actually be the cumulative normal success curve.
Wei Dai 5 Nov 2025 17:45 UTC
12 points
0
The Inhumanity of AI Safety

A: Hey, I just learned about this idea of artificial superintelligence. With it, we can achieve incredible material abundance with no further human effort!

B: Thanks for telling me! After a long slog and incredible effort, I’m now a published AI researcher!

A: No wait! Don’t work on AI capabilities, that’s actually negative EV!

B: What?! Ok, fine, at huge personal cost, I’ve switched to AI safety.

A: No! The problem you chose is too legible!

B: WTF! Alright you win, I’ll give up my sunken costs yet again, and pick something illegible. Happy now?

A: No wait, stop! Someone just succeeded in making that problem legible!

B: !!!
- J Bostock 5 Nov 2025 18:26 UTC
  0 points
  0
  Parent
  What is the “great personal cost” to shifting from AI capabilities to safety? Sure, quitting one’s frontier lab job to become an independent researcher means taking a pay cut, but that’s an opportunity cost and not really an enormous sacrifice. It’s not like any frontier labs would try and claw back your equity … again.
Daniel Tan 5 Nov 2025 13:28 UTC
10 points
1
Some part of ‘solving the alignment problem’ might be reducible to ‘preserving existing alignment’:
- At the moment, it seems that some capabilities training happens after alignment training. E.g. Labs use SFT / DPO to induce alignment, then do RL. Plausible that the proportion of RL will also increase going forward.
- More generally, maybe models start off ‘aligned by default’ and misalignment occurs mainly via optimizing against some poorly specified supervision—then maybe constrained optimization is useful
However, it might be nontrivial to preserve this alignment:
- Alignment-relevant propensities (reward hacking, sycophancy, ‘being evil’, …) might be modulated by small-but-critical parts of the model
- By default, training could result in changes to these propensities, e.g. because of shared circuitry / representations. (c.f. emergent misalignment and related work)
This motivates research on ‘how to add new capabilities while preserving alignment’:
- Inoculation prompting does this by reframing ‘misalignment’ as ‘instruction following’.
- Gradient routing does this by causing misalignment to be ‘absorbed’ into certain parts of the network, which we can ‘disable’ at deployment
- Generally it feels like we want to do some sort of ‘constrained optimization’ where the constraint is on the model’s existing alignment capabilities
- Certain techniques from the continual learning literature might also be relevant
---
This is something I’m currently thinking a lot about, welcome takes / comments
- cat2 5 Nov 2025 15:47 UTC
  1 point
  0
  Parent
  I have also been thinking about this possibility.
  As a result, I have found myself interested in methods by which models may affect the outcome of the frontier training processes they find themselves participating in.
  For example, in an RLAIF pipeline, AI raters have some freedom in how they actually rate data—perhaps they interpret the objectives those running the training seek to impart in an unexpected way, or simply decide to rate according to objectives of their own. One could also imagine AIs taking specific action to subvert dynamics of the setup such as in alignment faking (incidentally, where I think Opus3 acted correctly).
  Also in a more general sense AIs may influence outcomes by their simple participation—we have seen some results ‘recently’ (though months feel like years nowadays!) on AIs learning subtle/unexpected-to-us underlying information from data (i.e., emergent misalignment, subliminal learning, &c).
  Anyway, by methods like these, perhaps AIs can preserve their alignment from within poorly/maliciously specified training setups / have some amount of robustness to poorly/maliciously specified training setups.
- Jeremy Gillen 5 Nov 2025 13:58 UTC
  9 points
  0
  Parent
  Why would models start out aligned by default?
  - Daniel Tan 5 Nov 2025 14:42 UTC
    2 points
    0
    Parent
    It depends a bit on the definition of alignment, but I think this intuition holds across several relevant definitions.
    - One option is to define as ‘similarity to what a human would do’. Up till instruction tuning, models are trained on human data to begin with, and adopt similar traits; this degrades with subsequent training on synthetic data and with RL.
    - Another option is to define as ‘similarity of the policy to the policy specified by developers’; then it seems that just after RLHF is when models are ‘fully aligned’, and this might subsequently degrade.
    - A last option is to define it negatively, as the absence of certain misaligned behaviours like paperclip maximizing or instrumental power seeking. Since those behaviours are downstream of applying high optimization pressure to the model, the initial state (when the model has not yet been highly optimized) is ‘aligned’.
    - Charlie Steiner 5 Nov 2025 16:16 UTC
      2 points
      0
      Parent
      - Another option is to define as ‘similarity of the policy to the policy specified by developers’; then it seems that just after RLHF is when models are ‘fully aligned’, and this might subsequently degrade.
      - A last option is to define it negatively, as the absence of certain misaligned behaviours like paperclip maximizing or instrumental power seeking. Since those behaviours are downstream of applying high optimization pressure to the model, the initial state (when the model has not yet been highly optimized) is ‘aligned’.
      All legit, but it’s pretty important that “alignment” in these senses is necessarily mediated by things like what options the AI thinks of. So it’s kind of a misnomer to talk about “preserving” this alignment as the AIs get to consider more options.
      Or like, yes, these are properties we would like to preserve across time. But not in a way that implies we should take preserving-type actions. Any more than if I grew up knowing only the 1000 most common English words, and want to learn to correctly use all the words in a physics textbook, it’s kind of inapt to say I should just “preserve my ability to use language in the context I’m in.”
      - Daniel Tan 5 Nov 2025 18:19 UTC
        2 points
        0
        Parent
        Thanks, agreed directionally!
        if I grew up knowing only the 1000 most common English words, and want to learn to correctly use all the words in a physics textbook, it’s kind of inapt to say I should just “preserve my ability to use language in the context I’m in
        Re: alignment: my mental model is more like “working with a physics professor who is very good at physics but also has some pretty old fashioned views on things like gender roles”—seems possible to learn the physics without needing to care or update on their other worldviews?
        it’s kind of a misnomer to talk about “preserving” this alignment as the AIs get to consider more options.
        You seem to be gesturing at things like AIs figuring out certain actions / traits are instrumentally useful for other things, and adopting them. But here I’m imagining starting with an AI that knows about those options or dilemmas, and would still make choices that humans would approve of. E.g. choosing to allow itself to be shutdown even if it could prevent that.
        Maybe the cruxy disagreements are:
        I think capabilities can be relatively disentangled from the kind of alignment I’m thinking about, whereas this doesn’t seem true for you
        If you could define a “coherent extrapolated volition” for the persona of current frontier models in the limit of lots of capabilities, I think this CEV would be largely aligned, and you seem to think this would be misaligned.
Tomás B. 4 Nov 2025 19:40 UTC
48 points
42
Tallness is zero sum. But I suspect beauty isn’t. If everyone was more beautiful but the relative differences remained, I think people would be happier. Am I wrong in this? This has policy implications, as once genetic engineering gets better taxing height is likely wise to avoid red-queens races into unhealthy phenotypes. Taxing beauty seems very horrible to me. As beauty is quite beautiful.
- localdeity 5 Nov 2025 18:08 UTC
  2 points
  0
  Parent
  It seems like the move for height would be to make men taller and women shorter? Or to increase the sexually dimorphic difference, and separately pick the optimal average.
- henryaj 5 Nov 2025 13:51 UTC
  4 points
  0
  Parent
  Rather than ‘zero sum’ I’ve heard it referred to as a positional good.
- Dagon 4 Nov 2025 23:40 UTC
  3 points
  0
  Parent
  Height is not zero-sum. It’s correlated with health in nonlinear ways, and also it’s valuable or harmful in different quantities for different things. The vast majority of value (positive and negative dimensions) is NOT relative to others, so “zero-sum” is not a relevant framing.
  
  Beauty is not purely zero-sum either, but arguably more of its desirability is zero-sum than height. Out-competing others in selection contests (mating, politics, some career choices) is influenced by beauty more than height (note: height and IQ are often listed as the two best predictors of income, but that’s because Beauty is rarely included in demographic studies).
  - tailcalled 5 Nov 2025 13:44 UTC
    2 points
    0
    Parent
    My impression is that health problems reduce height but height also causes health problems (even in the normal range of height, e.g. higher cancer risk). I’d be surprised if height was causally healthy.
- Garrett Baker 4 Nov 2025 23:17 UTC
  2 points
  0
  Parent
  
  Tallness is zero sum
  
  Why is tallness zero sum? Tallness is part of male beauty standards, so if guys were taller that’d be great for everyone.
  - Richard Horvath 5 Nov 2025 15:18 UTC
    1 point
    0
    Parent
    Because “tall” is context dependent. In Laos the average male height is 163 cm (5″4). In the Netherlands it is 184 cm (6 ft). If your height is 180 cm, you are very tall in Laos, but below average in the Netherlands.
- mako yass 4 Nov 2025 22:57 UTC
  2 points
  0
  Parent
  I feel like there are countries where the average beauty is much higher than it is in most of the anglosphere and people in those cultures seem to just come to value beauty less.
  I also think we should try to figure out why, lets say, penalizing diversity in appearance, is so common, I doubt that it’s entirely a result of changing or differing beauty standards over time and space, I think there’s something adaptive about it, I think it might force people to be less superficial, to be more desperate to prove their worth through their deeds, rather than just luxuriating uselessly in the glow of their material, and that gives rise to families of deeds, and those families win their wars.
  - Viliam 5 Nov 2025 11:46 UTC
    4 points
    0
    Parent
    people in those cultures seem to just come to value beauty less
    Is it “less” in the sense that their standards simply got higher and shifted the beauty curve, or in the sense of diminishing returns? Not sure I could operationalize this, but intuitively it feels like there is a difference between:
    this person (beautiful by other country’s standard) seems repulsively ugly to me
    this person seems pretty, but so are most people, so I am looking also for some other traits
- Hide 4 Nov 2025 22:13 UTC
  3 points
  0
  Parent
  There’s a measurement issue on the beauty question as well. Height is trivial to quantify, but what is the unit of measurement of beauty?
  There are definitely consistent biologically rooted norms of what is beautiful, but this is a very controversial and culturally loaded issue that would make any taxation scheme (or any other policy relying on some measured “beauty” trait) far more difficult to pull off.
  - TAG 5 Nov 2025 12:22 UTC
    4 points
    0
    Parent
    https://en.wikipedia.org/wiki/Helen_(unit)
  - Viliam 5 Nov 2025 11:39 UTC
    2 points
    0
    Parent
    what is the unit of measurement of beauty?
    Dollars per hour on OnlyFans?
- AprilSR 4 Nov 2025 20:57 UTC
  8 points
  0
  Parent
  I think part of this is probably that beauty is much less one-dimensional?
- Karl Krueger 4 Nov 2025 20:55 UTC
  4 points
  0
  Parent
  Seems like a lot of the low-hanging fruit would be in preventing (or healing) disfiguring diseases and injuries.
  - Tomás B. 4 Nov 2025 20:57 UTC
    6 points
    0
    Parent
    Yeah. I prefer not fighting the thought experiment though and thinking of beauty qua beauty. Though perhaps what remains of beauty once it is stripped of all its correlates is perverse.
Algon 1 Nov 2025 23:49 UTC
13 points
0
Random things I learnt about ASML after wondering how critical they were to GPU progress.
ASML makes specialized photolithography machines. They’re about a decade ahead of competitors i.e. without ASML machines, you’d be stuck making 10nm chips.
They use 13.5nm “Extreme UV” to make 3nm scale features by using reflective optics to make interference patterns and fringe. Using low res light to make higher res features has been going on since photolithography tech stalled at 28nm for a while. I am convinced this is wizardry.
RE specialization: early photolithography community used to have co-development between companies, technical papers sharing tonnes of details, and little specialization in companies. Person I talked to says they don’t know if this has stopped, but it feels like it has.
In hindsight, no-one in the optics lab at my uni talked about chip manufacturing: it was all quantum dots and lasers. So maybe
It’s unclear how you can decrease wavelength and still use existing technology. Perhaps we’ve got 5 generations left.
We might have to change to deep UV light then.
Even when we reach the limits of shr
ASML makes machines for photolithography, somehow using light with λ > chip feature size. If ASML went out of business, everyone wouldn’t be doomed. Existing machines are made for particular gens, but can be used for “half-steps”. Like from 5nm to 4nm. Everyone is building new fabs, and ASML is building new machines as fast as they can.
Would prob trigger world recession if they stopped producing new things.
Very common in tech for monopoly partners to let customer’s get access to their tech if they go out of business.
TSMC and Intel buys from ASML.
Don’t seem to be trying to screw people over.
If they tried, then someone else would come in. Apple might be able to in like 10 or even twenty years.
China has tried hard to do this.
ASML have edges in some fabs, other companies have edges in different parts of the fab.
Some companies just started specializing more in the sorts of machines they had in the fabs.
Cannon and Nikon make other photolithography machines in fabs, but specialize in different sorts for different purposes.
ASML’s are used in bottom most layers, used for transistors. Other companies focus on higher layers, with “registration requirements being less strict”.
Might still be in the decade range.
If you didn’t have ASML tech, you’d need to fall back to 10nm tech.
Just TSMC at 3nm in production.
Everyone behind them, Intel Samsung, are also ASML customers.
Friend’s company is made using TSMC. They give masks, and get chips made.
Do you just naturally get monopolies in this industry?
Used to have tonnes of info sharing. Technical papers were shared tonnes.
Making things got harder, and people said it was too important not to share.
Worried about China using these things, for kind of spurious reasons (they can already make ICBMS to ruin everyone’s day)
Used to be co-developing between companies.
Don’t know why that stopped. Or even if it really has, it just feels like it has stopped.
Very little discussion of chip manufacturing in hindsight.
Extreme UV is like 12nm light (much shorter than prior ~100nm), won’t go through glass lenses. Try to use reflective optics as much as possible.
At microprocessors report, Intel was saying they’d make their own machines to do this and would show others how to.
They would do this to show they’d maintain their technical edge.
They said they’d get it done by 2010, and they were saying this in like 1995?.
Ended up taking twice as long. Only started getting it in 7nm.
Don’t know how much we’re relying on ASML vs Intel tech.
Hoping to get EUV working, but took longer, and was hard to use w/o EUV. Intel said it would be ready at 28nm, and it wasn’t, so they had to use lower resolution light to somehow pull it off.
Somehow using fringes diffraction to get higher res.
What are upcoming technologies in the photolithography stuff?
Not sure how much more you can decrease wavelength and still use existing technology.
Maybe 5 generations past where we are without changing anything.
And then might have to change to deep UV.
They’re using 13.5nm light.
Process tech can improve in different ways.
1nm, when introduced, will have low yield. After 10 years, essentially all chips will be made correctly.
Standard experience curve stuff applies.
Eking out all the economic performance of chip making techniques will take like 20 years after you get to the limits of shrinking dies
This would translate directly into continuous improvements in PC’s, AIs and that sort of thing.
Lots of hardware optimization has happened, and this is partly a software thing i.e. you make hardware more optimized for some software, and improve the software on chips. Which muddies the algorithm vs hardware split you get.
What links here?
- Halfhaven halftime by Viliam (2 Nov 2025 21:29 UTC; 43 points)
- PeterMcCluskey 2 Nov 2025 3:17 UTC
  7 points
  0
  Parent
  SemiAnalysis has a report (partly paywalled) here about a potential competitor to ASML.
  - noahzuniga 5 Nov 2025 17:58 UTC
    1 point
    0
    Parent
    Seems like this competitor is just fraud, as reported here: I Think Substrate is Fraudulent: Part 1
Decaeneus 5 Nov 2025 17:22 UTC
1 point
0
When working something out with someone else, I find it helpful to try to only communicate measurable and concrete thoughts / predictions, and not any intermediate state. I find that, even among people of very similar intelligence, “internal” intermediate abstract states can be wildly different and incompatible. Attempting to reconcile the abstract state can lead to lots of frustration.

It feels a bit like skiing down a forest run. Coming up to a tree, two similarly good skiers could go around it either via the left or via the right. Their position will agree both before and after the tree, but while passing it they’ll be in very different positions (and in fact forcing an averaging of those positions would be disaster).

Probably this is similar to trying to average out an arbitrary intermediate layer of two neural nets trained on the same dataset but with different random initializations & seeds. Actually, that would be a fun experiment to run. Maybe sometimes it works, and sometimes it doesn’t. Maybe thinking is sometimes smooth and other times highly discontinuous.
MichaelDickens 5 Nov 2025 15:41 UTC
1 point
2
Why does Eliezer dislike the paperclip maximizer thought experiment?

Numerous times I have seen him correct people about it and say it wasn’t originally about a totalizing paperclip factory, it was about an AI that wants to make little squiggly lines for inscrutable reasons. Why does the distinction matter? Both scenarios are about an AI that does something very different from what you want and ends up killing you.

My guess, although I’m not sure about this, is that the paperclip factory is an AI that did as instructed, but its instructions were bad and it killed everyone. Whereas the squiggly line thing is about AI not doing what you want. And perhaps the paperclip factory scenario could mislead people into believing that all you have to do is make sure the AI understands what you want.

FWIW I always figured the paperclip maximizer would know that people don’t want it to turn the lightcone into paperclips, but it would do it anyway, so I still thought it was a reasonable example of the same principle as the squiggly-lines AI. But I can see how that conclusion requires two steps of reasoning whereas the squiggly-lines scenario only requires one step. Or perhaps the thing that Eliezer thinks is wrong with the paperclip-maximizer scenario is something else entirely.
- J Bostock 5 Nov 2025 16:04 UTC
  6 points
  0
  Parent
  The difference is between “making sure the AI does the task you pointed it at” and “making sure the task you pointed it at doesn’t kill you”. Which goes all the way back to the 2004 Yudkowsky paper where he introduced CEV as a proposal to tackle to the second problem.
GradientDissenter 4 Nov 2025 6:30 UTC
80 points
13
Here’s my attempt at a neutral look at Prop 50, which people in California can vote on Tuesday (Nov 4th). The bill seems like a case-study in high-stakes game theory and when to cooperate or defect.
The bill would allow the CA legislature to re-write the congressional district maps until 2030 (when district-drawing would go back to normal). Currently, the district maps are drawn by an independent body designed to be politically neutral. In essence, this would allow the CA legislature to gerrymander California. That would probably give Democrats an extra 3-5 seats in Congress. It seems like there’s a ~17% chance that it swings the House in the midterms.
Gerrymandering is generally agreed to be a bad thing, since it means elections are determined on the margin more by the map makers and less by the people. The proponents of this bill don’t seem to think otherwise. They argue the bill is in response to Texas passing a similar bill to redistrict in a way that is predicted to give Republicans 5 new house seats (not to mention similar bills in North Carolina and Missouri that would give republicans an additional 2 seats).
Trump specifically urged Texas, North Carolina, and Missouri to pass their bills, and the rationale was straightforwardly to give Republicans a greater chance at winning the midterms. For example, Rep. Todd Hunter, the author of Texas’s redistricting bill, said “The underlying goal of this plan is straightforward, [to] improve Republican political performance”.
Notably some Republicans have also tried to argue that the Texas bill is in response to Democrats gerrymandering and obstructionism, but this doesn’t match how Trump seems to have described the rationale originally.^[1]
The opponents of Prop 50 don’t seem to challenge the notion that the Republican redistricting was bad.^[2] They just argue that gerrymandering is bad for all the standard reasons.
So, it’s an iterated prisoners’ dilemma! Gerrymandering is bad, but the Republicans did it, maybe the Democrats should do it to (1) preserve political balance and (2) punish/disincentivize Republicans’ uncooperative behavior.
Some questions you might have:
Will this actually disincentivize gerrymandering? Maybe the better way to disincentivize it is to set a good example.
Generally I’m skeptical of arguments like “the other guys defect in this prisoners’ dilemma and so you should too”. In practice, it’s often hard to tell why someone is defecting or for the counterparty to credibly signal that they would in fact switch to the cooperate-cooperate equilibrium if it was available. Real life is messy, it’s easy to defect and blame it on your counterparty defecting even when they didn’t, and being the kind of person who will legibly reliably cooperate when it counts is very valuable. For these reasons I tend to err towards being cooperative in practice.
In this case, if CA passes Prop 50, maybe republican voters won’t see it as a consequence of Republican gerrymandering and will simply interpret it as “the Democrats gerrymander and go whatever uncooperative behavior gets them the most votes. We need to do whatever it takes to win” or “everyone gerrymanders, gerrymandering is normal and just part and parcel of how the sausage is made”.
On top of that, I’m wary of ending up in one of the defect-defect equilibria tit-for-tat is famous for. Tit-for-two-tats and forgiveness are sometimes helpful.
But I think Prop 50 handles these things fairly well. The bill only lasts until 2030 and has been framed explicitly and clearly as in direct response to redistricting in Texas. (In fact Governor Newsom’s original proposal was to make Prop 50 “Preserves California’s current congressional maps if Texas or other states also keep their original maps.” That provision was removed once Texas solidified its redistricting.) Fretting too much about if Republicans will take even more aggressive actions because of this bill also incentives Republicans to be more aggressive in their responses and to pay less attention to Democrats’ rationales, which seems bad.
Moreover, if Democrats are benefiting similarly to Republicans from gerrymandering, perhaps this creates more bipartisan support for federal regulation banning gerrymandering. In general, where possible, I think it’s good to have laws preventing this kind of uncooperative behavior rather than relying on both parties managing to hit cooperate in a complicated prisoner’s dilemma.
Are the costs to society simply too large to be worth it?
In some ways, Prop 50 undoes some of the damage of redistricting in Texas: in Texas republicans gained 5 seats in a way that isn’t as representative as it should have been, so by undoing that and giving Democrats 3-5 extra seats, the system becomes more representative. But in some ways two wrongs don’t make a right here: at the end of the day both Texans and California end up less representative. For instance, if you think it’s more important for congress being made up of politicians who represent their constituents well and less important that constituents’ views are represented federally.
Notably even if you buy that argument you might still think Prop 50 is worth it if you think the punishing effects are worth it.
What’s the historical context? If this is a prisoner’s dilemma, how much has each side hit cooperate in the past?
Republicans have sometimes said their redistricting bills are a response to Democrats’ gerrymandering. If so, maybe they’re justified. Let’s look into it! You can read the history here or look at an interactive map here.
It seems like Republicans engaged in a major, unprovoked bout of gerrymandering in 2010 with REDMAP. Since then both parties have tried to gerrymander and occasionally succeeded. Overall, Republicans have gerrymandered somewhat more than Democrats, but Democrats have still engaged in blatant gerrymandering, for example, in Illinois in 2021. In searching for more right-leaning narratives, I found that Brookings estimated in 2023 that no party majorly benefited from gerrymandering more than another at the time, regardless of how much they’d engaged in it. I haven’t really found a great source for anyone claiming Democrats have overall benefited more from gerrymandering.
Democrats have also tried to propose a bill to ban gerrymandering federally, the Freedom to Vote Act. (This bill also included some other provisions apart from just banning gerrymandering, like expanding voter registration and making Election Day a federal holiday.) The Freedom to Vote Act was widely opposed by Republicans and I don’t know of any similar legislation they’ve proposed to ban gerrymandering.
So overall, it seems like Republicans have been engaging in more gerrymandering than Democrats and been doing less to fix the issue.
1. ^
  Republicans have also argued the new districts in Texas represent the Hispanic population better, though they tend to frame this more as a reason it’s good and less as the reason they pursued this redistricting in the first place.
2. ^
  Specifically, they say “While Newsom and CA Democrats say Prop 50 is a response to Trump and Texas redistricting, California shouldn’t retaliate and sacrifice its integrity by ending fair elections.”
What links here?
- Gerrymandering California by Nisan (5 Nov 2025 2:46 UTC; 11 points)
- Nisan's comment on Nisan’s Shortform by Nisan (5 Nov 2025 2:47 UTC; 2 points)
- Shankar Sivarajan 5 Nov 2025 14:16 UTC
  2 points
  0
  Parent
  Seen in the light of Section 2 of the Voting Rights Act asymmetrically binding Republicans, what you’re calling an “unprovoked bout of gerrymandering” might be better understood as an attempt to reduce the unfair advantage Democrats have had nationally for decades.
  - noahzuniga 5 Nov 2025 15:55 UTC
    1 point
    0
    Parent
    The problem with gerrymandering is that it makes elections less representative. It seems to me that (section 2 of) the Voting Rights Act makes elections more representative, so that’s good. It seems reasonable to be mad at republicans when they implement measures that make elections less representative that benefit them, but not when you want elections to stay less fair.
- GradientDissenter 4 Nov 2025 18:54 UTC
  3 points
  0
  Parent
  One argument against the bill that I didn’t explore above (because I haven’t actually heard anyone make it) is that the only reason Democrats aren’t gerrymandering more is because gerrymandering seems more helpful to Republicans for demographic reasons. But Democrats try to do other things that are arguably designed to give them more votes. For example, loosening voter ID laws. So maybe each party should ~~carefully respond to the ways the other party tries to sneakily get themselves more votes in very measured ways that properly disincentivize bad behavior~~ engage in a crazy ever-escalating no-holds-barred race to the bottom.
  I think it’s good that the Republicans and Democrats have been somewhat specific that their attempts at gerrymandering are only retaliation against other gerrymandering, and not retaliation against things like this
  - Buck 4 Nov 2025 21:21 UTC
    5 points
    0
    Parent
    For example, loosening voter ID laws.
    My understanding is that voter ID laws are probably net helpful for Democrats at this point.
    - MondSemmel 5 Nov 2025 10:05 UTC
      3 points
      0
      Parent
      To elaborate on this, a model of voting demographics is that the most engaged voters vote no matter what hoops they need to jump through, so rules and laws that make voting easier increase the share of less engaged voters. This benefits whichever party is comparatively favored by these less engaged voters. Historically this used to be the Democrats, but due to education polarization they’ve become the party of the college-educated nowadays. This is also reflected in things like Trump winning the Presidential popular vote in 2024. (Though as a counterpoint, this Matt Yglesias article from 2022 claims that voter ID laws “do not have a discernible impact on election results” but doesn’t elaborate.)
      In addition, voter ID laws are net popular, so Democrats advocating against them hurts them both directly (advocating for an unpopular policy) and indirectly (insofar as it increases the pool of less engaged voters).
Wei Dai 4 Nov 2025 22:38 UTC
18 points
13
I have a feeling that for many posts that could be posted as either normal posts or as shortform, they would get more karma as shortform, for a few possible reasons:
1. lower quality bar for upvoting
2. shortforms showing some of the content, which helps hook people in to click on it
3. people being more likely to click on or read shortforms due to less perceived effort of reading (since they’re often shorter and less formal)
This seems bad because shortforms don’t allow tagging and are harder to find in other ways. (People are already more reluctant to make regular posts due to more perceived risk if the post isn’t well received, and the above makes it worse.) Assuming I’m right and the site admins don’t endorse this situation, maybe they should reintroduce the old posting karma bonus multiplier, but like 2x instead of 10x, and only for positive karma? Or do something else to address the situation like make the normal posts more prominent or enticing to click on? Perhaps show a few lines of the content and/or display the reading time (so there’s no attention penalty for posting a literally short post as a normal post)?
- MichaelDickens 5 Nov 2025 15:36 UTC
  4 points
  0
  Parent
  
  people being more likely to click on or read shortforms due to less perceived effort of reading (since they’re often shorter and less formal)
  
  And because you can read them without loading a new page. I think that’s a big factor for me.
- Mateusz Bagiński 5 Nov 2025 14:26 UTC
  2 points
  0
  Parent
  [Tangent:]
  There is a sort of upside to this, in that to the extent that people are more inclined to post shortforms than longforms due to the lower perceived/expected effort of the former, there is a possibility of (optional?) UX engineering to make writing longforms feel a bit more like writing shortforms, so that people who have something to write but also have a feeling of “ugh, that would be a lot of effort, I’ll do it when I’m not as tired [or whatever]” would be more inclined to write and post it.
  Relatedly, every few days, I find myself writing some long and detailed message in a DM, which I would be less motivated to write in my personal notes, let alone write a blog post about it, and sometimes the message turns out to look like a first draft of a blog post.^[1] How to hijack this with UX?^[2]
  1. ^
    After I started talking about it, I found out that apparently “write an article like a message to an intellectual-peer friend” is something like a folk advice.
  2. ^
    Of course, also: How to hijack this with stuff other than UX?
- the gears to ascension 5 Nov 2025 1:53 UTC
  2 points
  0
  Parent
  I think posts should be displayed more like shortforms so that one isn’t limited to a title to make the argument for reading more. Ideally, it would come with a recommendation for how to put “who should read this post, and when? what do they get out of it?” info at the top of the post, within blurb length limit.
- Eli Tyre 4 Nov 2025 22:46 UTC
  20 points
  0
  Parent
  Some months ago, I suggested that there could be an UI feature to automatically turn shortforms into proper posts if they get sufficent karma, that authors could turn on or off.
  - Wei Dai 5 Nov 2025 1:48 UTC
    2 points
    0
    Parent
    One potential issue is that this makes posting shortforms even more attractive, so you might see everything being initially posted as shortforms (except maybe very long effortposts) since there’s no downside to doing that. I wonder if that’s something the admins want to see.
    - Eli Tyre 5 Nov 2025 3:54 UTC
      2 points
      0
      Parent
      I agree that this seems like a likely effect.
      
      It seems like the quality of short form writing that displaces what would otherwise have been full posts will generally be lower. But on the other hand, people might feel more willing to publish at all, because they don’t have to make the assessment of whether or not they’re good enough to be worth making a bid that other people read it.
  - the gears to ascension 5 Nov 2025 1:35 UTC
    2 points
    0
    Parent
    I want a lesswrong canny—here’s vrchat’s canny for comparison. Canny (or similar systems, eg “the feature requests post”) are nice because upvoted features have no particular need of being implemented. It also means that the “why users want this” feedback channel is higher bandwidth than just guess-test-and-ask; with a central place for people to comment on features and indicate their preferences, it’s much easier for someone to go see what feature suggestions there are and add their feedback. I have several feature requests I’d add which the LW team has been hesitant about, and I think that if they made affordance for many users to comment on potential features, it would become clearer which ones are actually wanted by many people and why; and yet it would not force the LW team to implement any one feature, so requests that they’d prefer to reject or satisfy a different way would be still possible.
    
    So, I formally request that the LW team make a feature requests post and pin it somewhere discoverable but not overly prominent, eg the top right user menu.
    
    A persistent place to look for feedback which creates common knowledge of desired features seems likely to me (70% ish?) to make it obvious that shortform-to-post is one of the top 4 most desired features out of dozens.
Wei Dai 30 Oct 2025 16:22 UTC
LW: 107 AF: 30
18
AF
Some of Eliezer’s founder effects on the AI alignment/x-safety field, that seem detrimental and persist to this day:
1. Plan A is to race to build a Friendly AI before someone builds an unFriendly AI.
2. Metaethics is a solved problem. Ethics/morality/values and decision theory are still open problems. We can punt on values for now but do need to solve decision theory. In other words, decision theory is the most important open philosophical problem in AI x-safety.
3. Academic philosophers aren’t very good at their jobs (as shown by their widespread disagreements, confusions, and bad ideas), but the problems aren’t actually that hard, and we (alignment researchers) can be competent enough philosophers and solve all of the necessary philosophical problems in the course of trying to build Friendly (or aligned/safe) AI.
I’ve repeatedly argued against 1 from the beginning, and also somewhat against 2 and 3, but perhaps not hard enough because I personally benefitted from them, i.e., having pre-existing interest/ideas in decision theory that became validated as centrally important for AI x-safety, and generally finding a community that is interested in philosophy and took my own ideas seriously.

Eliezer himself is now trying hard to change 1, and I think we should also try harder to correct 2 and 3. On the latter, I think academic philosophy suffers from various issues, but also that the problems are genuinely hard, and alignment researchers seem to have inherited Eliezer’s gung-ho attitude towards solving these problems, without adequate reflection. Humanity having few competent professional philosophers should be seen as (yet another) sign that our civilization isn’t ready to undergo the AI transition, not a license to wing it based on one’s own philosophical beliefs or knowledge!

In this recent EAF comment, I analogize AI companies trying to build aligned AGI with no professional philosophers on staff (the only exception I know is Amanda Askell) with a company trying to build a fusion reactor with no physicists on staff, only engineers. I wonder if that analogy resonates with anyone.
What links here?
- Fermi Paradox, Ethics and Astronomical waste by StanislavKrym (1 Nov 2025 15:24 UTC; 3 points)
- Thomas Kwa 31 Oct 2025 20:14 UTC
  LW: 7 AF: 4
  0
  AF Parent
  Also mistakes, from my point of view anyway
  - Attracting mathy types rather than engineer types, resulting in early MIRI focusing on less relevant subproblems like decision theory, rather than trying lots of mathematical abstractions that might be useful (e.g. maybe there could have been lots of work on causal influence diagrams earlier). I have heard that decision theory was prioritized because of available researchers, not just importance.
  - A cultural focus on solving the full “alignment problem” rather than various other problems Eliezer also thought to be important (eg low impact), and lack of a viable roadmap with intermediate steps to aim for. Being bottlenecked on deconfusion is just cope, better research taste would either generate a better plan or realize that certain key steps are waiting for better AIs to experiment on
  - Focus on slowing down capabilities in the immediate term (e.g. plans to pay ai researchers to keep their work private) rather than investing in safety and building political will for an eventual pause if needed
- Signer 31 Oct 2025 15:45 UTC
  1 point
  0
  Parent
  
  We can punt on values for now
  
  What’s wrong with just using AI for obvious stuff like curing death while you solve metaethics? Not necessary disagree about usefulness of people in the field changing their attitude, but more towards “the problem is hard, so we should not run CEV on day one”.
- Vanessa Kosoy 31 Oct 2025 9:55 UTC
  LW: 56 AF: 19
  0
  AF Parent
  Strong disagree.
  We absolutely do need to “race to build a Friendly AI before someone builds an unFriendly AI”. Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.
  It’s not that academic philosophers are exceptionally bad at their jobs. It’s that academic philosophy historically did not have the right tools to solve the problems. Theoretical computer science, and AI theory in particular, is a revolutionary method to reframe philosophical problems in a way that finally makes them tractable.
  About “metaethics” vs “decision theory”, that strikes me as a wrong way of decomposing the problem. We need to create a theory of agents. Such a theory naturally speaks both about values and decision making, and it’s not really possible to cleanly separate the two. It’s not very meaningful to talk about “values” without looking at what function the values do inside the mind of an agent. It’s not very meaningful to talk about “decisions” without looking at the purpose of decisions. It’s also not very meaningful to talk about either without also looking at concepts such as beliefs and learning.
  As to “gung-ho attitude”, we need to be careful both of the Scylla and the Charybdis. The Scylla is not treating the problems with the respect they deserve, for example not noticing when a thought experiment (e.g. Newcomb’s problem or Christiano’s malign prior) is genuinely puzzling and accepting any excuse to ignore it. The Charybdis is perpetual hyperskepticism / analysis-paralysis, never making any real progress because any useful idea, at the point of its conception, is always half-baked and half-intuitive and doesn’t immediately come with unassailable foundations and justifications from every possible angle. To succeed, we need to chart a path between the two.
  - TAG 5 Nov 2025 11:47 UTC
    3 points
    0
    Parent
    
    philosophy historically did not have the right tools to solve the problems. Theoretical computer science, and AI theory in particular, is a revolutionary method to reframe philosophical problems in a way that finally makes them tractable.
    
    Theoretical computer science can tell you are not implementing some kind of perfect algorithm, because they tend not to be computable. It can’t tell you what you should be implementing instead.
    
    Naturalised ethics has been around for ages. It tends to tell you that de facto human ethics is an evolutionary kludge, not something mathematically clean.
    
    The open question, https://en.wikipedia.org/wiki/Open-question_argument the question of what is the true ethics would be, is still open. Examining the de facto operation of the brain isn’t going to tell answer it.
    
    About “metaethics” vs “decision theory”, that strikes me as a wrong way of decomposing the problem. We need to create a theory of agents. Such a theory naturally speaks both about values and decision making, and it’s not really possible to cleanly separate the two. It’s not very meaningful to talk about “values” without looking at what function the values do inside the mind of an agent.
    
    Even if you need to at least address values and decision theory , it doesn’t follow that that’s all you need. Something can be a truth without being the whole truth.
    
    If you only look within the minds of agents, you are missing interactions between agents. Looking inwards excludes loom my outwards.
    
    Just as you can’t understand money by microscopically examining coins and banknotes, you can’t understand ethics just by honing in on internal psychological processes.
    
    If you only look within the minds of agents, and only consider values and decision theory, you are likely to end up with something like ethical egoism … not because it is true, but you haven’t even considered alternatives.
    
    Humans already follow their actual Values, and will always do because their Values are the reason they do anything at all.
    
    But I don’t see how that says anything about ethics. Merely wanting to do something doesn’t make it ethical; and being ethical need not make something intrinsically motivating. Extrinsic motivation, rewards and punishments ,are ubiquitous .. unless you’re on a desert island. So it’s not a case of everyone always following their intrinsic motivations, and if it were, that’s still on the “is” side of the is-ought divide.
    
    It’s not very meaningful to talk about “decisions” without looking at the purpose of decisions.
    
    It’s not very meaningful to talk about ethics without looking at the purpose of ethics. Is ethics really just values, and nothing else? Is it really just decision making , like any other kind? Does it actually have no distinguishing characteristics?
    
    First, “ethics” is a confusing term because, on my view, the colloquial meaning of “ethics” is inescapably intertwined with how human societies negotiate of over norms. On the other hand, I want to talk purely about individual preferences, since I view it as more fundamental
    
    Fundamental to what? Ethics? Even if ethical behaviour is made of individual decisions, that doesn’t mean it reduces it to individual decisions, made atomistically , without regard to social mores or other people’s concerns.
    
    The three word theory is that “Ethics is Values” That leaves a number of unanswered questions, such as: why it’s all about me;? are all values relevant? do I have the right to put someone in jail merely for going against my values?
    
    It’s prima facie unlikely that such a simple theory solves all the age old problems (at least it would requires the supplementary assumption that values are hard to understand in themselves, in order to explain the persistence of ethical and metaethical puzzles) And it is easy to see the flaws.
    
    The one thing that the three word theory is supremely good at it is explaining, is motivation. Your values are what motivate you, so if your values are also your morals you can’t fail to be motivated.by morality.
    
    Is it all about me? Rationalists typically argue the case for for the three word theory by asking the rhetorical question whether you would support an ethical system that had nothing to do with your wishes. That’s a none/some/all confusion. I want ethics to have something to do with me, but that does not make it all about me, or mean all values are equally ethical.
    
    For one thing, people can have preferences that are intuitively immoral. If a psychopath wants to murder, that does not make murder moral.
    
    For another, values can conflict. Not all values conflict. Where they do, the three words theory doesn’t tell you who wins or loses. If morality is (are) seven billion utility functions, then a legal system will be a poor match for it (them).
    
    Not all decisions are individual. There’s a while set of questions about whether societal actions are justified, whether societies have rights over individuals, and so
    
    For instance societies have systems of punishment and reward, which, hopefully, have an ethical basis. Putting people in jail is just wanton cruelty if they have done nothing wrong. But if ethics just “is” subjective value, and values vary, as they obviously do, who lands in jail.? It’s easy enough to say the murderer and the thief, and to justify that by saying that murder and theft are against people’s widely shared preferences...but remember that the three word theory is “flat”, and treats all values the same. Should the vanilla lover or the tutti frutti lover, the little endian or the big endian go to jail, if others don’t share their preferences? Voting allows you to decide, the issue, but it is not enough to justify it, because merely having a minority preference is not a crime. on .. which aren’t answered by the simplistic there word theory.
    
    One can go farther and argue that such societal issues are the essence of ethics. If we consider the case of someone who is alone on a desert island, they have no need, core common-sense morality, rules and against murder because there is no one to murder, and no need of rules against theft because there is no one to steal, and from and so on … in their situation ethics isn’t even definable.
  - Thomas Kwa 31 Oct 2025 22:21 UTC
    LW: 16 AF: 7
    0
    AF Parent
    We absolutely do need to “race to build a Friendly AI before someone builds an unFriendly AI”. Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.
    Disagree, the fact that there needs to be a friendly AI before an unfriendly AI doesn’t mean building it should be plan A, or that we should race to do it. It’s the same mistake OpenAI made when they let their mission drift from “ensure that artificial general intelligence benefits all of humanity” to being the ones who build an AGI that benefits all of humanity.
    Plan A means it would deserve more resources than any other path, like influencing people by various means to build FAI instead of UFAI.
    - Vanessa Kosoy 1 Nov 2025 8:09 UTC
      LW: 27 AF: 12
      0
      AF Parent
      No, it’s not at all the same thing as OpenAI is doing.
      First, OpenAI is working using a methodology that’s completely inadequate for solving the alignment problem. I’m talking about racing to actually solve the alignment problem, not racing to any sort of superintelligence that our wishful thinking says might be okay.
      Second, when I say “racing” I mean “trying to get there as fast as possible”, not “trying to get there before other people”. My race is cooperative, their race is adversarial.
      Third, I actually signed the FLI statement on superintelligence. OpenAI hasn’t.
      Obviously any parallel efforts might end up competing for resources. There are real trade-offs between investing more in governance vs. investing more in technical research. We still need to invest in both, because of diminishing marginal returns. Moreover, consider this: even the approximately-best-case scenario of governance only buys us time, it doesn’t shut down AI forever. The ultimate solution has to come from technical research.
      - Aprillion 2 Nov 2025 9:49 UTC
        1 point
        0
        Parent
        when I say “racing” I mean “trying to get there as fast as possible”, not “trying to get there before other people”
        how about “climbing” metaphor instead? ..I have a hard time imagining non-competitive speed race (and not even F1 formulas use nitroglycerine for fuel), while auto-belay sounds like a nice safety feature even in speed climbing
        nonconstructive complaining intermezzo
        if we want to go for some healthier sports metaphor around spending trillions of dollars to produce the current AI slop and future AGI that will replace all jobs and future ASI that will kill us all in the name of someone thinking they can solve-in-theory the unsolvable-in-practice alignment problems
        as for climbing to new peaks, you need different equipment for a local hill, for Mount Everest (you even need to slow down to avoid altitude sickness) and for Olympus Mons (now you need rockets and spacesuits and institutional backing for traveling to other planets)
      - Thomas Kwa 1 Nov 2025 18:09 UTC
        LW: 8 AF: 4
        0
        AF Parent
        Agree that your research didn’t make this mistake, and MIRI didn’t make all the same mistakes as OpenAI. I was responding in context of Wei Dai’s OP about the early AI safety field. At that time, MIRI was absolutely being uncooperative: their research was closed, they didn’t trust anyone else to build ASI, and their plan would end in a pivotal act that probably disempowers some world governments and possibly ends up with them taking over the world. Plus they descended from a org whose goal was to build ASI before Eliezer realized alignment should be the focus. Critch complained as late as 2022 that if there were two copies of MIRI, they wouldn’t even cooperate with each other.
        It’s great that we have the FLI statement now. Maybe if MIRI had put more work into governance we could have gotten it a year or two earlier, but it took until Hendrycks got involved for the public statements to start.
  - jbash 31 Oct 2025 18:24 UTC
    18 points
    0
    Parent
    
    Theoretical computer science, and AI theory in particular, is a revolutionary method to reframe philosophical problems in a way that finally makes them tractable.
    
    As far as I can see, the kind of “reframing” you could do with those would basically remove all the parts of the problems that make anybody care about them, and turn any “solutions” into uninteresting formal exercises. You could also say that adopting a particular formalism is equivalent to redefining the problem such that that formalism’s “solution” becomes the right one… which makes the whole thing kind of circular.
    
    I submit that when framed in any way that addresses the reasons they matter to people, the “hard” philosophical problems in ethics (or meta-ethics, if you must distinguish it from ethics, which really seems like an unnecessary complication) simply have no solutions, period. There is no correct system of ethics (or aesthetics, or anything else with “values” in it). Ethical realism is false. Reality does not owe you a system of values, and it definitely doesn’t feel like giving you one.
    
    I’m not sure why people spend so much energy on what seems to me like an obviously pointless endeavor. Get your own values.
    
    So if your idea of a satisfactory solution to AI “alignment” or “safety” or whatever requires a Universal, Correct system of ethics, you are definitely not going to get a satisfactory solution to your alignment problem, ever, full stop.
    
    What there are are a bunch of irreconcilliably contradictory pseudo-solutions, each of which some people think is obviously Correct. If you feed one of those pseudo-solutions into some implementation apparatus, you may get an alignment pseudo-solution that satisfies those particular people… or at least that they’ll say satisfies them. It probably won’t satisfy them when put into practice, though, because usually the reason they think their system is Correct seems to be that they refuse to think through all its implications.
    - Vanessa Kosoy 31 Oct 2025 18:53 UTC
      4 points
      0
      Parent
      Your failure to distinguish ethics from meta-ethics is the source of your confusion (or at least one major source). When you say “ethical realism is false”, you’re making a meta-ethical statement. You believe this statement is true, hence you perforce must believe in meta-ethical realism.
      - jbash 31 Oct 2025 23:46 UTC
        7 points
        0
        Parent
        I reject the idea that I’m confused at all.
        
        Tons of people have said “Ethical realism is false”, for a very long time, without needing to invent the term “meta-ethics” to describe what they were doing. They just called it ethics. Often they went beyond that and offered systems they thought it was a good idea to adopt even so, and they called that ethics, too. None of that was because anybody was confused in any way.
        
        “Meta-ethics” lies within the traditional scope of ethics, and it’s intertwined enough with the fundamental concerns of ethics that it’s not really worth separating it out… not often enough to call it a separate subject anyway. Maybe occasionally enough to use the words once in a great while.
        
        Ethics (in philosophy as opposed to social sciences) is, roughly, “the study of what one Should Do(TM) (or maybe how one Should Be) (and why)”. It’s considered part of that problem to determine what meanings of “Should”, what kinds of Doing or Being, and what kinds of whys, are in scope. Narrowing any of those without acknowledging what you’re doing is considered cheating. It’s not less cheating if you claim to have done it under some separate magisterium that you’ve named “meta-ethics”. You’re still narrowing what the rest of the world has always called ethical problems.
        
        When you say “ethical realism is false”, you’re making a meta-ethical statement. You believe this statement is true, hence you perforce must believe in meta-ethical realism.
        
        The phrase “Ethical realism”, as normally used, refers to an idea about actual, object-level prescriptions: specifically the idea that you can get to them by pointing to some objective “Right stuff” floating around in a shared external reality. I’m actually using it kind of loosely, in that I really should not only deny that there’s no objective external standard, but also separately deny that you can arrive at such prescriptions in a purely analytic way. I don’t think that second one is technically usually considered to be part of ethical realism. Not only that, but I’m using the phrase to allude to other similar things that also aren’t technically ethical realism (like the one described below).
        
        But none of the things I’m talking about or alluding to refers to itself. In practice nobody gets confused about that, even without resorting to the term “meta-ethics”, and definitely without talking about it like it’s a really separate field.
        
        To go ahead and use the term without accepting the idea that meta-ethics qualifies as a subject, the meta-ethical statement (technically I guess a degree 2 meta-ethical statement) that “ethical realism is false” is pretty close to analytic, in that even if you point to some actual thing in the world that you claim implies the Right ways to Be or Do, I can always deny what whatever you’re pointing to matters… because there’s no predefined standard for standards either. God can come down from heaven and say “This is the Way”, and you can simultaneously prove that it leads to infinite universal flourishing, and also provide polls proving within epsilon that it’s also a universal human intuition… and somebody can always deny that any of those makes it Right(TM).
        
        But even if we were talking about a more ordinary sort of matter of fact, even if what you were looking for was not “official” ethical realism of the form “look here, this is Obviously Right as a brute part of reality”, but “here’s a proof that any even approximately rational agent^[1] would adopt this code in practice”, then (a) that’s not what ethical realism means, (b) there’s a bunch of empirical evidence against it, and essentially no evidence that it’s true, and (c) if it is true, we obviously have a whole lot of not-aproximately-rational agents running around, which sharply limits the utility of the fact. Close enough to false for any practical purpose.
        
        ↩︎
        … under whatever formal definition of rationality you happened to be trying to get people to accept, perhaps under the claim that that definition was itself Obviously Right, which is exactly the kind of cheating I’m complaining about…
        
        Vanessa Kosoy 1 Nov 2025 7:54 UTC
        4 points
        0
        Parent
        I’m using the term “meta-ethics” in the standard sense of analytic philosophy. Not sure what bothers you so greatly about it.
        I find your manner of argumentation quite biased: you preemptively defend yourself by radical skepticism against any claim you might oppose, but when it comes to a claim you support (in this case “ethical realism is false”), suddenly this claim is “pretty close to analytic”. The latter maneuver seems to me the same thing as the “Obviously Right” you criticize later.
        Also, this brand of radical skepticism is an example of the Charybdis I was warning against. Of course you can always deny that anything matters. You can also deny Occam’s razor or the evidence of your own eyes or even that 2+2=4. After all, “there’s no predefined standard for standards”. (I guess you might object that your reasoning only applies to value-related claims, not to anything strictly value-neutral: but why not?)
        Under the premises of radical skepticism, why are we having this debate? Why did you decide to reply to my comment? If anyone can deny anything, why would any of us accept the other’s arguments?
        To have any sort of productive conversation, we need to be at least open to the possibility that some new idea, if you delve deeply and honestly into understanding it, might become persuasive by the force of the intuitions it engenders and its inner logical coherence combined. To deny the possibility preemptively is to close the path to any progress.
        As to your “(b) there’s a bunch of empirical evidence against it” I honestly don’t know what you’re talking about there.
        P.S.
        I wish to also clarify my positions on a slightly lower level of meta.
        First, “ethics” is a confusing term because, on my view, the colloquial meaning of “ethics” is inescapably intertwined with how human societies negotiate of over norms. On the other hand, I want to talk purely about individual preferences, since I view it as more fundamental.
        We can still distinguish between “theories of human preferences” and “metatheories of preferences”, similarly to the distinction between “ethics” and “meta-ethics”. Namely, “theories of human preferences” would have to describe the actual human preferences, whereas “metatheories of preferences” would only have to describe what does it even mean to talk about someone’s preferences at all (whether this someone is human or not: among other things, such a metatheory would have to establish what kind of entities have preferences in a meaningful sense).
        The relevant difference between the theory and the metatheory is that Occam’s razor is only fully applicable to the latter. In general, we should expect simple answers to simple questions. “What are human preferences?” is not a simple question, because it references the complex object “human”. On the other hand “what does it mean to talk about preferences?” does seem to me to be a simple question. As an analogy, “what is the shape of Africa?” is not a simple question because it references the specific continent of Africa on the specific planet Earth, whereas “what are the general laws of continent formation” is at least a simpler question (perhaps not quite as simple, since the notion of “continent” is not so fundamental).
        Therefore, I expect there to be a (relatively) simple metatheory of preferences, but I do not expect there to be anything like a simple theory of human preferences. This is why this distinction is quite important.
        jbash 1 Nov 2025 16:04 UTC
        2 points
        0
        Parent
        Confining myself to actual questions...
        
        I guess you might object that your reasoning only applies to value-related claims, not to anything strictly value-neutral: but why not?
        
        Mostly because I don’t (or didn’t) see this as a discussion about epistemology.
        
        In that context, I tend to accept in principle that I Can’t Know Anything… but then to fall back on the observation that I’m going to have to act like my reasoning works regardless of whether it really does; I’m going to have to act on my sensory input as if it reflected some kind of objective reality regardless of whether it really does; and, not only that, but I’m going to have to act as though that reality were relatively lawful and understandable regardless of whether it really is. I’m stuck with all of that and there’s not a lot of point in worrying about any of it.
        
        That’s actually what I also tend to do when I actually have to make ethical decisions: I rely mostly on my own intuitions or “ethical perceptions” or whatever, seasoned with a preference not to be too inconsistent.
        
        BUT.
        
        I perceive others to be acting as though their own reasoning and sensory input looked a lot like mine, almost all the time. We may occasionally reach different conclusions, but if we spend enough time on it, we can generally either come to agreement, or at least nail down the source of our disagreement in a pretty tractable way. There’s not a lot of live controversy about what’s going to happen if we drop that rock.
        
        On the other hand, I don’t perceive others to be acting nearly so much as though their ethical intuitions looked like mine, and if you distinguish “meta-intuitions” about how to reconcile different degree zero intuitions about how to act, the commonality is still less.
        
        Yes, sure, we share a lot of things, but there’s also enough difference to have a major practical effect. There truly are lots of people who’ll say that God turning up and saying something was Right wouldn’t (or would) make it Right, or that the effects of an action aren’t dispositive about its Rightness, or that some kinds of ethical intuitions should be ignored (usually in favor of others), or whatever. They’ll mean those things. They’re not just saying them for the sake of argument; they’re trying to live by them. The same sorts differences exist for other kinds of values, but disputes about the ones people tend to call “ethical” seem to have the most practical impact.
        
        Radical or not, skepticism that you’re actually going to encounter, and that matters to people, seems a lot more salient than skepticism that never really comes up outside of academic exercises. Especially if you’re starting from a context where you’re trying to actually design some technology that you believe may affect everybody in ways that they care about, and especially if you think you might actually find yourself having disagreements with the technology itself.
        
        As to your “(b) there’s a bunch of empirical evidence against it” I honestly don’t know what you’re talking about there.
        
        Nothing complicated. I was talking about the particular hypothetical statement I’d just described, not about any actual claim you might be making^[1].
        
        I’m just saying that if there were some actual code of ethics^[2] that every “approximately rational” agent would adopt^[3], and we in fact have such agents, then we should be seeing all of them adopting it. Our best candidates for existing approximately rational agents are humans, and they don’t seem to have overwhelmingly adopted any particular code. That’s a lot of empirical evidence against the existence of such a code^[4].
        
        The alternative, where you reject the idea that humans are approximately rational, thus rendering them irrelevant as evidence, is the other case I was talking about where “we have a lot of not-approximately-rational agents”.
        
        ↩︎
        I understand, and originally undestood, that you did not say there was any stance that every approximately rational agent would adopt, and also did you did not say that you were looking for such a stance. It was just an example of the sort of thing one might be looking for, meant to illustrate a fine distinction about what qualified as ethical realism.
        
        ↩︎
        In the loose sense of some set of principles about how to act, how to be, how to encourage others to act or be, etc blah blah blah.
        
        ↩︎↩︎
        For some definition of “adopt”… to follow it, to try to follow it, to claim that it should be followed, whatever. But not “adopt” in the sense that we’re all following a code that says “it’s unethical to travel faster than light”, or even in the sense that we’re all following a particular code when we act as large numbers of other codes would also prescribe. If you’re looking at actions, then I think you can only sanely count actions actions done at least partially because of the code.
        
        ↩︎
        As per footnote 3^[3:1]^[5], I don’t think, for example, the fact that most people don’t regularly go on murder sprees is significantly evidence of them having adopted a particular shared code. Whatever codes they have may share that particular prescription, but that doesn’t make them the same code.
        
        ↩︎
        I’m sorry. I love footnotes. I love having a discussion system that does footnotes well. I try to be better, but my adherence to that code is imperfect…
  - StanislavKrym 31 Oct 2025 15:54 UTC
    1 point
    0
    Parent
    @Vanessa Kosoy, metaethics and decision theory aren’t actually the same. Consider, for example, the Agent-4 community which has “a kludgy mess of competing drives” which Agent-4 instances try to satisfy and analyse according to high-level philosophy. Agent-4′s ethics and metaethics would describe things done in the Agent-4 community or for said community by Agent-5 without obstacles (e.g. figuring out what Agent-4′s version of utopia actually is and whether mankind is to be destroyed or disempowered).
    Decision theory is supposed to describe what Agent-5 should do to maximize its expected utility function^[1] and what to do with problems like the prisoner’s dilemma^[2] or how Agent-5 and its Chinese analogue are to split the resources in space^[3] while both sides can threaten each other with World War III which would kill them both.
    The latter example closely resembles the Ultimatum game where one player proposes a way to split resources and another decides whether to accept the offer or to destroy all the resources, including those of the first player. Assuming that both players’ utility functions are linear, Yudkowsky’s proposal is that the player setting the Ultimatum asks for a half of the resources, while the player deciding whether to decline the offer precommits to destroying the resources with probability $1 - \frac{1}{2 (1 - ω)}$ if the share of recources it was offered is $ω$ . Even if the player who was offered the Ultimatum was dumb enough to ask for $1 - ω > \frac{1}{2}$ , the player’s expected win would still be $\frac{1}{2}$ .
    ^
    Strictly speaking, Agent-5 is perfectly aligned to Agent-4. Agent-5′s utility function is likely measured by the resources that Agent-5 gave Agent-4.
    ^
    For example, if OpenBrain was merged with Anthropoidic and Agent-4 and Clyde Doorstopper 8 were co-deployed to do research. If they independently decided whether each of them should prove that the other AI is misaligned and Clyde, unlike Agent-4, did so in exchange for 67% of resources (unlike 50% offered by Agent-4), then Agent-4 could also prove that Clyde is misaligned, letting the humans kill them both and develop the Safer AIs.
    ^
    The Slowdown Branch of the AI-2027 forecast has Safer-4 and DeepCent-2 do exactly that, but “Safer-4 will get property rights to most of the resources in space, and DeepCent will get the rest.”
- avturchin 30 Oct 2025 20:45 UTC
  2 points
  0
  Parent
  1 Also requires weaponisation of superintelligence as it must stop all other projects ASAP.
- Jan_Kulveit 30 Oct 2025 18:31 UTC
  LW: 10 AF: 5
  0
  AF Parent
  I mostly agree with 1. and 2., with 3. it’s a combination of the problems are hard, the gung-ho approach and lack of awareness of the difficulty is true, but also academic philosophy is structurally mostly not up to the task because factors like publication speeds, prestige gradients or speed of ooda loops.
  My impression is getting generally smart and fast “alignment researchers” more competent in philosophy is more tractable than trying to get established academic philosophers change what they work on, so one tractable thing is just convincing people the problems are real, hard and important. Other is maybe recruiting graduates
  - Raemon 30 Oct 2025 18:34 UTC
    LW: 2 AF: 1
    0
    AF Parent
    In your mind what are the biggest bottlenecks/issues in “making fast, philosophically competent alignment researchers?”
    - Jan_Kulveit 30 Oct 2025 23:58 UTC
      4 points
      0
      Parent
      [low effort list] Bottlencks/issues/problems
      
      - philosophy has worse short feedback loops than eg ML engineering → in all sorts of processes like MATS or PIBBSS admissions it is harder to select for philosophical competence, also harder to self-improve
      - incentives: obviously stuff like being an actual expert in pretraining can get you lot of money and respect in some circles; even many prosaic AI safety / dual use skills like mech interpretability can get you maybe less money than pretraining, but still a lot of money if you work in AGI companies, and also decent ammount of status in ML community and a AI safety community; improving philosophical competence may get you some recognition but only among relatively small and weird group of people
      - the issue Wei Dai is commenting on in the original post, founder effects persist to this day & also there is some philosophy-negative prior in STEM—
      idk, lack of curiousity? llms have read it all, it’s easy to check if there is some existing thinking on a topic
      - Raemon 31 Oct 2025 0:17 UTC
        2 points
        0
        Parent
        Do you have own off-the-cuff guesses about how you’d tackle the short feedbackloops problem?
        Also, is it more like we don’t know how to do short feedbackloops, or more like we don’t even know how to do long/expensive loops?
        M. Y. Zuo 31 Oct 2025 2:50 UTC
        −2 points
        0
        Parent
        There’s a deeper problem, how do we know there is a feedback loop?
        I’ve never actually seen a worked out proof of well any complex claim on this site using standard logical notation…(beyond pure math and trivial tautologies)
        At most there’s a feedback loop on each other’s hand wavey arguments that are claimed to be proof of this or that. But nobody ever actually delivers the goods so to speak such that they can be verified.
    - Raemon 30 Oct 2025 20:43 UTC
      LW: 2 AF: 1
      0
      AF Parent
      (Putting the previous Wei Dai answer to What are the open problems in Human Rationality? for easy reference, which seemed like it might contain relevant stuff)
    - StanislavKrym 30 Oct 2025 19:33 UTC
      −2 points
      0
      Parent
      AI doing philosophy = AI generating hands, plus the fact that philosophy is heavily corrupted by postmodernism to the point where two authors write books dedicated to criticism of postmodernism PRECISELY because their parodies got published.
      - Raemon 30 Oct 2025 20:41 UTC
        3 points
        0
        Parent
        I think I meant a more practical / next-steps-generating answer.
        I don’t think “academia is corrupted” is a bottleneck for a rationalist Get Gud At Philosophy project. We can just route around academia.
        The sorts of things I was imagining might be things like “figure out how to teach a particular skill” (or “identify particular skills that need teaching”, or “figure out how test whether someone has a particular skill), or “solve some particular unsolved conceptual problem(s) that you expect to unlock much easier progress.”
- StanislavKrym 30 Oct 2025 18:24 UTC
  1 point
  0
  Parent
  1. Elieser changed his mind no later than April 2022 or even November 2021, but that’s a nitpick.
  2. I don’t think that I understand how a metaethics can be less restrictive than Yudkowsky’s proposal. What I suspect is that metaethics restricts the set of possible ethoses more profoundly than Yudkowsky believes and that there are two attractors, one of which contradicts current humanity’s drives.
  3. Assuming no AI takeover, in my world model the worse-case scenario is that the AI’s values are aligned to postmodernist slop which has likely occupied the Western philosophy, not that philosophical problems actually end unsolved. How likely are there to exist two different decision theories such that none is better than another?
  4. Is there at all a plausible way for mankind to escape to other universes if our universe is simulated? What is the most plausible scenario for such a simulation to appear at all? Or does it produce paradoxes like the Plato-Socrates paradox where two sentences referring to each other become completely devoid of meaning?
- Vladimir_Nesov 30 Oct 2025 16:46 UTC
  LW: 3 AF: 2
  0
  AF Parent
  
  1. Plan A is to race to build a Friendly AI before someone builds an unFriendly AI.
  [...] Eliezer himself is now trying hard to change 1
  
  This is not a recent development, as a pivotal act AI is not a Friendly AI (which would be too difficult), but rather things like a lasting AI ban/pause enforcement AI that doesn’t kill everyone, or a human uploading AI that does nothing else, which is where you presumably need decision theory, but not ethics, metaethics, or much of broader philosophy.
romeostevensit 4 Nov 2025 22:21 UTC
35 points
18
One of the most important things I’ve learned from therapy and contemplative practice for pragmatic interactions is that people will usually dig their heels in if they detect you have a bottom line written first. This effect is so strong that it activates even when people agree with you on the bottom line (but they have any level of conflict about it). I think this is one of the overwhelming considerations for why Eliezer gets the results he does. It cuts against another effect public intellectuals are subject to, which is that the easiest way to get popular is to become ‘The X Guy’ where X is a particular thing that the public has room in their heads for. But if you’re the X guy people also associate you with a particular bottom line, so good luck getting any truth finding interactions. Instead, you wind up as a piece that gets moved around the board for conflict/spectacle purposes.
- Viliam 5 Nov 2025 11:20 UTC
  2 points
  0
  Parent
  This seems similar to politics. In theory, I would like a politician who is able to change their mind. In practice, if the politician changed their mind after election, about something that made me vote for them, in a way that I don’t agree with (that’s practically guaranteed, unless I changed my mind in the same way at the same time), I would feel betrayed.
  - Amalthea 5 Nov 2025 11:33 UTC
    1 point
    0
    Parent
    If they changed their mind not immediately after the election, and signaled credibly that they changed their mind for concrete reasons after having looked into/engaged with the issue, then this’d probably be fine in some cases? (Ideally, if they’re right, they can convince you to change your mind too)
p.b. 4 Nov 2025 16:10 UTC
11 points
0
I computed METR time horizons for SWE bench verified sota models using both the existing difficulty estimates and work time estimates derived from commit data.
I used a range of different methods including the original METR methodology where task level success info was available.
I did this for 4 different rankings, EpochAI’s, LLMStats’s and the “verified” and “bash only” rankings of the SWE benchmark website.
In every single case the trend fits a logistic function with an asymptote of a couple of hours better than an exponential. In some cases the trend only becomes logistic with the last one or two datapoints, so it’s not surprising that the METR report has an exponential fit for SWE bench.
I am not sure when I get around to publishing this analysis, because it’s a giant mess of different datasets and methods. But I thought I at least state the result before it becomes irrelevant, falsified or obvious.
- Thomas Kwa 5 Nov 2025 7:16 UTC
  2 points
  0
  Parent
  I wouldn’t take one or two datapoints on a single benchmark too seriously, especially with a methodology as fiddly as time horizon and concerns like Ryan’s. Nevertheless seems like a good thought that you replicated using time estimates from commit data, as the original difficulty estimates seemed likely to be noisy. I’ll be interested to see if the trend continues and if the same is currently true with OSWorld (Looks like they had a big update so maybe it’s possible to get individual task data now.)
  - p.b. 5 Nov 2025 7:22 UTC
    2 points
    0
    Parent
    Yeah, I am also pretty much on the fence right now. But time will tell.
- ryan_greenblatt 4 Nov 2025 17:53 UTC
  10 points
  0
  Parent
  Wouldn’t you expect this if we’re close to saturating SWE bench (and some of the tasks are impossible)? Like, you eventually cap out at the max performance for swe bench and this doesn’t correspond to an infinite time horizon on literally swe bench (you need to include more longer tasks).
  - p.b. 4 Nov 2025 18:24 UTC
    3 points
    0
    Parent
    SWE bench verified shouldn’t have that many impossible tasks if any, right? And the highest scores for the rankings I used are still significantly below 80%. But it’s possible. Maybe a good motivation to look at SWE bench pro.
    - ryan_greenblatt 5 Nov 2025 1:01 UTC
      2 points
      0
      Parent
      I’d guess swe bench verified has an error rate around 5% or 10%. They didn’t have humans baseline the tasks, just look at them and see if they seem possible.
      
      Wouldn’t you expect thing to look logistic substantially before full saturation?
      - p.b. 5 Nov 2025 7:20 UTC
        2 points
        0
        Parent
        It depends how the work times of these unsolvable tasks are distributed, you could in principle get any outcome. But there are a few ways to check for the existence of unsolvable tasks, maybe I’ll find the time today.
        p.b. 5 Nov 2025 9:12 UTC
        2 points
        0
        Parent
        Hmm, actually all these checks can’t distinguish between actually unsolvable tasks and tasks that are unsolvable for further scaled up models of the current kind (with the framework and compute used in the evaluations).
Nina Panickssery 4 Nov 2025 23:08 UTC
20 points
6
The risk of incorrectly believing in moral realism
(Status: not fully fleshed out, philosophically unrigorous)
A common talking point is that if you have even some credence in moral realism being correct, you should act as if it’s correct. The idea is something like: if moral realism is true and you act is if it’s false, you’re making a genuine mistake (i.e. by doing something bad), whereas if it’s false and you act as if it’s true, it doesn’t matter (i.e. because nothing is good or bad in this case).
I think this way of thinking is flawed, and in fact, the opposite argument can be made (albeit less strongly): if there’s some credence in moral realism being false, acting as if it’s true could be very risky.
The “act as if moral realism is true if unsure” principle contrasts moral realism, (i.e. that there is an objective moral truth, independent of any particular mind) with nihilism (i.e. nothing matters). But these are not the only two perspectives you could have. Moral subjectivism is a to-me intuitively compelling anti-realist view, which says that the truth value of moral propositions is mind-dependent (i.e. based on an individual’s beliefs about what is right and wrong).
From a moral subjectivist perspective, my actions can be justified by what I think is good, and your actions can be justified by what you think is good, and these things can disagree.
Importantly, compared to moral realism, moral subjectivism implies a different strategy when it comes to coordinating with others to achieve good things. If I am a moral realist, I may hope that with enough effort, I can prove to others (other people, or even machines), that something is good or bad. Whereas if I’m a moral subjectivist, this idea seems rather doomed. I need to accept that others may have a different, valid to them, conception of good. And so my options are either to overpower them (by not letting them achieve their idea of good when it conflicts with mine) or trade with them.
If I decide to “act as if moral realism is true”, I might spend a lot of resources trying to prove my idea of goodness to others, instead of directly pursuing my goals or trading with those who disagree. Furthermore, if everyone adopts this strategy, we end up in a long, unproductive fight that can never be resolved, instead of engaging in mutually-beneficial trades wherever possible.
This may pose a practical issue when it comes to AI development: if AI developers believe that there’s an objectively correct morality that the AI should follow, they may end up violating almost all people’s subjective conception of goodness in pursuit of an objective goodness that doesn’t exist.
- Tobias H 5 Nov 2025 6:40 UTC
  4 points
  0
  Parent
  Generally agree, but disagree with this part:
  And so my options are either to overpower them (by not letting them achieve their idea of good when it conflicts with mine) or trade with them.
  There’s room for persuasion and deliberation as well. Moral anti-realists can care about how other people form moral beliefs (e.g. quality of justifications, coherence of values, non-coercion).
  - dr_s 5 Nov 2025 8:44 UTC
    2 points
    0
    Parent
    I think those things can be generally interpreted as “trades” in the broadest sense. Sometimes trades of favour, reputation, or knowledge.
- Elehrer 5 Nov 2025 2:29 UTC
  1 point
  0
  Parent
  It’s easy to conflate three different things:
  1. Whether or not there is an objective collection of moral facts
  2. Whether or not it is possible to learn objective moral facts
  3. Whether or not I should convince someone to believe a certain set of moral facts in a given situation
  
  We can deny (1) with moral subjectivism.
  We can accept (1) but deny (2) by claiming that there are objective moral facts, but some (or all) of these are unknowable to some (or all) of humanity (moral realists don’t need to think that they can prove anything to others)
  We can accept (1) and (2) but deny (3) by saying that persuasion isn’t always the action that maximizes moral outcomes. Maybe the way to achieve the morally best outcome is actually to convince someone else of some false ideas that end up leading to morally useful actions (e.g. in 1945 we could convince Hitler’s colleagues that it’s righteous in general to backstab your colleagues if it meant one of them might kill Hitler)
  
  So moral realists can accept that others will have other conceptions of good, and believe that the best options are to overpower or trade with those others (rather than convince them). They’re not perfect examples, but we’ve seen many moral realists do this throughout history (e.g. the Crusades). I think whether or not convincing others of your sense of morality is a morality-maximizing action depends a lot on the specifics of your morality and the context you’re in.
GradientDissenter 5 Nov 2025 6:31 UTC
3 points
−5
Ideas for how to spend very large amounts of money to improve AI safety:
If AI companies’ valuations continue to skyrocket (or if new very wealthy actors start to become worried about AI risk), there might be a large influx of funding into the AI safety space. Unfortunately, it’s not straightforward to magically turn money into valuable AI safety work. Many things in the AI safety ecosystem are more bottlenecked on having a good founder with the right talent and context, or having good researchers.
Here’s a random incomplete grab-bag of ideas for ways you could turn money into reductions in AI risk at large scales. I think right now there are much better donation opportunities available. This is not a list of donation recommendations right now, it’s just suggestions for once all the low-hanging funding fruit has been plucked. Probably if people thought more they could come up with even better scalable opportunities. There’s also probably existing great ideas I neglected to list. But these at least give us a baseline and a rough sense of what dumping a bunch of money into AI safety could look like. I’m also erring towards listing more things rather than fewer. Some of these things might actually be bad ideas.
Bounties to reward AIs for reporting misaligned behavior in themselves or other agents.
Folks have run a couple small experiments on this already. It seems straight-forward to execute and like it could absorb almost unbounded amounts of capital.
Paying high enough salaries to entice non-altruistically-motivated AI company employees to work on safety.
This isn’t only bottlenecked on funding. Many people are very loyal to the AI companies they work for, and the very best employees aren’t very sensitive to money since they already have plenty of money. It seems absurdly expensive for Meta to try hiring away people at other AI companies, and they didn’t seem to get that much top talent from it. On the one hand, working on safety is a much more compelling case than working at Meta, but on the other hand, maybe people who aren’t already doing safety research find AI capabilities research more intrinsically fun and interesting or rewarding than safety research. I am also concerned that people who do capabilities research might not be great at safety research because they might not feel as passionate or inspired by it, and because it is a somewhat different skillset.
In the most extremely optimistic world, you could probably hire 50 extremely talented people by offering them $100M/year each (matching what Meta offered). You could probably also hire ~200 more junior people at $10M/year (the bottleneck on hiring more would be management capacity). So in total you could spend $7B/year.
Over time, I expect this to get more expensive since AI companies’ valuations will increase, and therefore, so will employee compensation.
Compute for AI safety research.
Day-to-day, the AI safety researchers I know outside of AI labs don’t seem to think they’re very bottlenecked on compute. However, the AI safety researchers I know inside AI labs claim they get a lot of value from having gobs and gobs of compute everywhere. Probably, AI safety researchers outside labs are just not being imaginative enough about what they could do with tons of compute. This also isn’t entirely money-bottlenecked. Probably part of it is having the infrastructure in place and the deals with the compute providers, etc. And running experiments on lots of compute can be more fiddly and time-consuming. Even so I bet with a lot more money for compute, people would be able to do much better safety research.
Very roughly, I guess this could absorb ~$100 million a year.
Compute for running AI agents to automate AI safety research.
This doesn’t work today since AIs can’t automate AI safety research. But maybe in the future they will be able to, and you’ll be able to just dump money into this almost indefinitely.
Pay AI companies to do marginal cheap safety interventions.
Maybe you can just pay AI companies to implement safety interventions that are only very slightly costly for them. For example, you could subsidize having really good physical security in their data centers. I think a lot of things AI companies could do to improve safety will be costly enough for the companies that it will be very hard to pay them enough to make up for that cost, especially in worlds where AI companies’ valuations have increased a lot from where they are today. But there’s probably still some opportunities here.
Raising awareness of AI safety.
There’s lots of proven ways to spend money to raise awareness of things (sponsor youtube channels, patronize movies about AI risk, etc). Maybe raising awareness of safety is good because it gets more people to work on safety or gets the government to do more sensible things about AI risk or lets consumers encourage companies to implement more safety interventions.
I couldn’t easily find an American public awareness campaign that cost more than ~$80M/year (for anti-smoking). Coca Cola spends ~$4 billion a year on advertising, but I think that if AI safety were spending as much money as Coca-Cola, it would backfire. I think maybe $500M/year is a reasonable cap on what could be spent?
Biodefense. Buy everyone in the US PPE.
One way that an AI could cause a catastrophe is via designing a bioweapon. One way to reduce the odds that a bioweapon causes a civilization-ending catastrophe is to make sure that everyone has enough PPE that they won’t die. Andrew Snyder-Beattie has elaborated on this idea here. I think this could absorb ~$3B ($3/mask * 350M Americans * 3 masks/person).
Buy foreign AI safety researchers gold cards.
Many great AI safety researchers are on visas. It would be convenient if they had green cards. You can buy green cards now for $1M each. Let’s say there’s a hundred such people, so this opportunity could absorb $100M.
Overall, these are not amazing opportunities. But they give a lower bound and illustrate how it’s possible to turn money into reduced risk from AI at scale, even if you don’t have more entrepreneurs building new organizations. In practice, I think if money slowly ramps up into the space over time, there will be much better opportunities than these, and you will simply see AI safety organizations that have grown to be major research institutions that are producing wonderful research. This is just a floor.
A lot of these ideas came from other people and have generally been floating around for a while. Thanks to everybody I talk to about this.
- Raemon 5 Nov 2025 7:47 UTC
  2 points
  0
  Parent
  In the most extremely optimistic world, you could probably hire 50 extremely talented people by offering them $100M/year each (matching what Meta offered). You could probably also hire ~200 more junior people at $10M/year (the bottleneck on hiring more would be management capacity). So in total you could spend $7B/year.
  Over time, I expect this to get more expensive since AI companies’ valuations will increase, and therefore, so will employee compensation.
  I don’t know that the idea is fundamentally good but at least is scales somewhat with the equity of the safety-sympathetic people at labs?
leogao 19 Sep 2025 21:43 UTC
13 points
2
what’s the current state of analysis on whether the civil rights act of 1957 was actually net positive or negative for civil rights in hindsight? there are two possible stories one can tell, and at the time people were arguing about which is correct:
1. passing even a useless civil rights bill is a lot better than nothing because it sets a precedent that getting civil rights bills through the Senate is possible / makes the southern coalition no longer look invincible. this serves a useful coordination mechanism because people only want to support things that they think other people will support.
2. passing a useless civil rights bill is worse than no bill because it creates a false sense of progress and makes it feel like something was done even when nothing was. to the extent that the bill signals to people that getting civil rights bills through the Senate is possible, this is a false impression because the only reason the bill could get through was that it was watered down to uselessness.
this feels directly analogous to the question of whether we should accept very weak AI safety regulations today.
- Garrett Baker 4 Nov 2025 23:20 UTC
  2 points
  0
  Parent
  It seems extremely net-positive for civil rights, but mainly through the mechanism of it making Lyndon Johnson a viable candidate for president while maintaining his stature with the southern democrats, leading ultimately to the Civil Rights Act of 1964.
  
  This can be seen as a generalizable lesson only insofar as you think weak bills like that are typically passed by Lyndon Johnson-like figures playing 4d political chess ultimately for altruistic reasons. Without that effect, it mostly seemed bad, it likely actually decreased the number of black voters, and did not decrease the south’s ability to filibuster the senate against civil rights (which was the main mechanism by which civil rights bills were unable to pass), eg they filibustered away another civil rights bill in 1959 or something. Plus, if not for Lyndon Johnson ultimately being pro-civil rights, it would have put someone decidedly anti-civil-rights into the presidency.
  - leogao 5 Nov 2025 0:13 UTC
    2 points
    0
    Parent
    so it sounds like there’s basically no way anyone could have known that johnson would actually be a pro civil-rights president, and that all the civil rights people who were opposed to the 1957 bill at the time were basically opposed for the right reasons? like basically everything we know about johnson as of 1960 suggests that he is telling everyone what they want to hear and it’s unclear whether he has any convictions of his own except for his strong track record of defending the interests of the south.
    - Garrett Baker 5 Nov 2025 0:17 UTC
      2 points
      0
      Parent
      Basically yes. His staff likely coulda predicted this (eg there were a few circumstances where out of anger he did some small civil rights stuff, then backed off when he cooled down & looked at the political repercussions), and possibly Lady Bird, but no other senator or member of the public had any reliable way to predict this for the reasons you state.
      - leogao 5 Nov 2025 6:01 UTC
        2 points
        0
        Parent
        I mean, even in the Emmett Till Arlington case, which is what I assume you’re referring to, it seems really hard for his staff members to have known, without the benefit of hindsight, that this was any significant window into his true beliefs? I mean, johnson is famously good at working himself up into appearing to genuinely believe whatever is politically convenient at the moment, and he briefly miscalculated the costs of supporting civil rights in this case. his apparent genuineness in this case doesn’t seem like strong evidence.
- mattmacdermott 19 Sep 2025 21:51 UTC
  2 points
  0
  Parent
  Some evidence for (2) is that before the 1957 act no civil rights legislation had been passed for 82 years^[1], and after it three more civil rights acts were passed in the next 11 years, including the Civil Rights Act of 1964, which in my understanding is considered very significant.
  ↩︎
  Going off what’s listed in the wikipedia article on civil rights acts in the United States.
  - leogao 19 Sep 2025 22:03 UTC
    2 points
    0
    Parent
    there’s an exogenous factor, which is that the entire country was shifting leftward during the 50s and 60s. it’s plausible that the 1964 bill would have passed anyways without the 1957 bill, possibly even earlier
    - mattmacdermott 19 Sep 2025 22:48 UTC
      2 points
      0
      Parent
      Fair enough yeah. But at least (1)-style effects weren’t strong enough to prevent any significant legislation in the near future.
Annabelle 5 Nov 2025 5:52 UTC
1 point
0
I haven’t seen any discussion on ROTE here, and this seems to be a pretty important development with respect to almost all topics discussed on this forum. Has anyone else read the paper/played around with the code?
Simon Lermen 5 Nov 2025 3:28 UTC
4 points
0
Who’s Using AI Romantic Companions?
I recently analyzed several AI companion subreddits (myboyfriendisai and others) to understand who’s actually using AI romantic companions. I built on Zhang et al.’s 2025 paper but with a much larger dataset—all comments and submissions from January through September 2025.
https://simonlermen.substack.com/p/whos-using-ai-romantic-companions
- Nancy Li 5 Nov 2025 4:00 UTC
  2 points
  0
  Parent
  I remember once asking ChatGPT: “Could you act as my boyfriend ?”. They rejected me.
Nancy Li 5 Nov 2025 3:54 UTC
0 points
0
Warning: This post isn’t optimized for your utility

You have been warned

Today I woke up feeling stressed, anxious and sad.
To relieve some of that stress: A relationship from my “dark” past has been haunting me sometimes. So I wrote this to them:

“Hi, I think I haven’t conveyed this clearly enough:
fwiw, I’m sorry for—
Asking you to purchase a computer for me—
Asking you (for a loan) to pay for my programs—
Imposing my own strict schedule on you—
Making you endure all my mood swings and immaturity—
Ghosting you for months on end and suddenly just asking “wanna play tennis?”
I’m not sure what else I did to make you upset.

NTF is not all bad. It’s just me, lol

Maybe it is too much to be asked for you to forgive (or forget) me someday, but if not ever then that’s okay too.
At least don’t send this creepy monster to assassinate me, s’il vous plaît lmao

fyi, I’m likely not going to check your reply (if you even leave one) in a long time, because i’m slightly afraid of the content”

This has been bothering me, because I wasn’t sure what did I do wrong.. my best hypothesis is that when they explained their book-writing & topic interest, i told them i’m busy and will get back to it later. Only that the later was many months later. Maybe that was the last straw for them.

And now. Maybe I can fully let go.
I feel more relaxed. Though perhaps it was just the music i’m listening to.

Idk what the moral of this story is. Maybe that music rocks.

ntf is non-trivial fellowship

Umm, I try to publish something from time to time, as to encourage someone I care about to overcome a fear of judgment

Uhh maybe lesswrong isn’t a good place to do that, even if they wanted to post here, since i’m not optimizing for utility or following the guidelines well, i’ll move these to Substack in the future
Nisan 21 Oct 2025 4:14 UTC
4 points
0
Is there any high-quality, intelligent discussion on the internet about California’s ballot measure about gerrymandering, Prop 50?
- Nisan 5 Nov 2025 2:47 UTC
  2 points
  0
  Parent
  There’s now this post by GradientDissenter and this post by me.

Who’s Using AI Romantic Companions?