habryka

Karma: 43,243

Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.

(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)

habryka May 1, 2025, 2:49 AM
6 points
4
in reply to: Raymond Douglas’s comment on: Bandwidth Rules Everything Around Me: Oliver Habryka on OpenPhil and GoodVentures
I would not characterize Dustin as straightforwardly “pushing back” in the relevant comment thread, more “expressing frustration with specific misinterprations but confirming the broad strokes”. I do think he would likely take offense to some of this framing, but a lot of it is really quite close to what Dustin said himself (and my model is more that Dustin is uncomfortable owning all the implications of the things he said, though this kind of thing is hard).

habryka May 1, 2025, 2:47 AM
11 points
5
in reply to: Zach Stein-Perlman’s comment on: Bandwidth Rules Everything Around Me: Oliver Habryka on OpenPhil and GoodVentures
I don’t currently believe this, and don’t think I said so. I do think the GV constraints are big, but also my overall assessment of the net-effect of Open Phil actions is net bad, even if you control for GV, though the calculus gets a lot messier and I am much less confident. Some of that is because of the evidential update from how they handled the GV situation, but also IMO Open Phil has made many other quite grievous mistakes.
My guess is an Open Phil that was continued to be run by Holden would probably be good for the world. I have many disagreements with Holden, and it’s definitely still a high variance situation, but I’ve historically been impressed with his judgement on many issues that I’ve seen OP mess up in recent years.

habryka Apr 27, 2025, 3:09 AM
2 points
0
in reply to: Dave Orr’s comment on: AI #113: The o3 Era Begins
Yeah, we gotta fix something about handling the Substack formatted content. It really looks ugly sometimes, though I haven’t yet chased down when.

habryka Apr 27, 2025, 2:55 AM
3 points
0
in reply to: Alice Blair’s comment on: LessWrong has been acquired by EA
No, I ended up getting sick that week and other deadlines then pushed the work later. It will still happen, but maybe only closer to LessOnline (i.e. in about a month).

habryka Apr 25, 2025, 8:26 PM
4 points
1
in reply to: ryan_greenblatt’s comment on: MichaelDickens’s Shortform
I agree that there is some ontological mismatch here, but I think your position is still in pretty clear conflict to what Neel said, which is what I was objecting to:
My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc.
“Not 100% motivated by it” IMO sounds like an implication that “being motivated by reducing x-risk would make up something like 30%-70% of the motivation”. I don’t think that’s true, and I think various things that Jaime has said make that relatively clear.

habryka Apr 25, 2025, 6:51 PM
LW: 8 AF: 6
6
AF
in reply to: Sam Marks’s comment on: Modifying LLM Beliefs with Synthetic Document Finetuning
This is a great thread and I appreciate you both having it, and posting it here!

habryka Apr 25, 2025, 6:49 PM
2 points
0
in reply to: ozziegooen’s comment on: MichaelDickens’s Shortform
I am not saying Jaime in-principle could not be motivated by existential risk from AI, but I do think the evidence suggests to me strongly that concerns about existential risk from AI are not among the primary motivations for his work on Epoch (which is what I understood Neel to be saying).
Maybe it is because he sees the risk as irreducible, maybe it is because the only ways of improving things would cause collateral damage for other things he cares about. I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.

habryka Apr 25, 2025, 2:01 AM
4 points
0
in reply to: Adam Scholl’s comment on: Putting up Bumpers
(This aligns with what I intended. I feel like my comment is making a fine point, even despite having missed the specific section.)

habryka Apr 24, 2025, 2:21 AM
16 points
9
in reply to: Neel Nanda’s comment on: MichaelDickens’s Shortform
My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely
I don’t think this is true. My sense is he views his current work as largely being good on non x-risk grounds, and thinks that even if it might slightly increase x-risk, he wouldn’t think it would be worth it for him to stop working on it, since he thinks it’s unfair to force the current generation to accept a slightly higher risk of not achieving longevity escape velocity and more material wealth in exchange for a small increase in existential risk.
He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
I selfishly care about me, my friends and family benefitting from AI. For some of my older relatives, it might make a big difference to their health and wellbeing whether AI-fueled explosive growth happens in 10 vs 20 years.
[...]
I wont endanger the life of my family, myself and the current generation for a small decrease of the chances of AI going extremely badly in the long term. And I don’t think it’s fair of anyone to ask me to do that. Not that it should be my place to unilaterally make such a decision anyway.
It seems very clear that Jaime thinks that AI x-risk, is unimportant relative to almost any other issue, given his non-interest in trading off x-risk against those other issues.
It is true that Jaime might think that AI x-risk could hypothetically be motivating to him, but at least my best interpretations of what is going on, suggest to me he de-facto does not consider it as an important input into his current strategic choices, or the choices of Epoch.

habryka Apr 23, 2025, 10:22 PM
20 points
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
A lot of new user submissions these days to LW are clearly some poor person who was sycophantically encouraged by an AI to post their crazy theory of cognition or consciousness or recursion or social coordination on LessWrong after telling them their ideas are great. When we send them moderation messages we frequently get LLM-co-written responses, and sometimes they send us quotes from an AI that has evaluated their research as promising and high-quality as proof that they are not a crackpot.

habryka Apr 23, 2025, 6:04 PM
LW: 16 AF: 6
0
AF
in reply to: Sam Bowman’s comment on: Putting up Bumpers
Ah, indeed! I think the “consistent” threw me off a bit there and so I misread it on first reading, but that’s good.
Sorry for missing it on first read, I do think that is approximately the kind of clause I was imagining (of course I would phrase things differently and would put an explicit emphasis on coordinating with other actors in ways beyond “articulation”, but your phrasing here is within my bounds of where objections feel more like nitpicking).

habryka Apr 23, 2025, 5:31 PM
LW: 35 AF: 17
18
AF
on: Putting up Bumpers
Each time we go through the core loop of catching a warning sign for misalignment, adjusting our training strategy to try to avoid it, and training again, we are applying a bit of selection pressure against our bumpers. If we go through many such loops and only then, finally, see a model that can make it through without hitting our bumpers, we should worry that it’s still dangerously misaligned and that we have inadvertently selected for a model that can evade the bumpers.
How severe of a problem this is depends on the quality and diversity of the bumpers. (It also depends, unfortunately, on your prior beliefs about how likely misalignment is, which renders quantitative estimates here pretty uncertain.) If you’ve built excellent implementations of all of the bumpers listed above, it’s plausible that you can run this loop thousands of times without meaningfully undermining their effectiveness.^[8] If you’ve only implemented two or three, and you’re unlucky, even a handful of iterations could lead to failure.
This seems like the central problem of this whole approach, and indeed it seems very unlikely to me that we would end up with a system that we feel comfortable scaling to superintelligence after 2-3 iterations on our training protocols. This plan really desperately needs a step that is something like “if the problem appears persistent, or we are seeing signs that the AI systems are modeling our training process in a way that suggests that upon further scaling they would end up looking aligned independently of their underlying alignment, we stop halt and advocate for much larger shifts in our training process, which likely requires some kind of coordinated pause or stop with other actors”.

habryka Apr 22, 2025, 5:17 AM
4 points
2
in reply to: Zach Stein-Perlman’s comment on: Pablo’s Shortform
Yeah, what would be my alternative true rejection? I don’t think the normalization effect is weak, indeed I expect even just within my social circle for this whole situation to come up regularly as justification for threatening people with libel suits.

habryka Apr 21, 2025, 4:28 PM
3 points
0
in reply to: MondSemmel’s comment on: Pablo’s Shortform
The FTX lawsuit was kind of reasonable IMO! Overall made me increase my trust in the court system for settling things related to bankruptcy.
I think there are many other institutions that are better suited to helping people navigate this kind of stuff. Google can deprioritize them in their search rankings. LLMs can provide reasonable fact-checks. A community-note like system could apply to Google Search results, or people over time switch towards platforms that provide them with community-note like systems.
Indeed, my sense is RationalWiki’s influence had already been decreasing very heavily, and the period in which people did not have antibodies against them was pretty short. And I think that period would have been even shorter if people had written up what they were doing earlier (my sense is a Tracing Woodgrain’s post on some of the core people involved was pretty helpful here).

habryka Apr 21, 2025, 6:59 AM
22 points
0
in reply to: Pablo’s comment on: Pablo’s Shortform
I think I am, all things considered, sad about this. I think libel suits are really bad tools for limiting speech, and I declined being involved with them when some of the plaintiffs offered me to be involved on behalf of LW and Lightcone.
I do think RationalWiki is one of the better applications of the relevant law, but the law is too abuse-prone, and normalizing its use would cause much more harm than RationalWiki ever caused, that I don’t think this was the right choice by the plaintiffs. I think it would have been a big personal sacrifice for the common good to not sue despite the high likelihood of success and the high ongoing personal harm incurred from RationalWiki actions, and so I have sympathy for the people who did sue, but I do think it’s pretty bad and they overall likely still made the world worse.

habryka Apr 20, 2025, 8:47 AM
20 points
15
in reply to: Lucie Philippon’s comment on: CBiddulph’s Shortform
I think it would be extremely bad for most LW AI Alignment content if it was no longer colocated with the rest of LessWrong. Making an intellectual scene is extremely hard. The default outcome would be that it would become a bunch of fake ML research that has nothing to do with the problem. “AI Alignment” as a field does not actually have a shared methodological foundation that causes it to make sense to all be colocated in one space. LessWrong does have a shared methodology, and so it makes sense to have a forum of that kind.
I think it could make sense to have forums or subforums for specific subfields that do have enough shared perspective to make a coherent conversation possible, but I am confident that AI Alignment/AI Safety as a field does not coherently have such a thing.

habryka Apr 20, 2025, 12:29 AM
2 points
0
in reply to: jacquesthibs’s comment on: What Makes an AI Startup “Net Positive” for Safety?
Ah, I see. I did interpret the framing around “net positive” to be largely around normal companies. It’s IMO relatively easy to be net-positive, since all you need to do is to avoid harm in expectation and help in any way whatsoever, which my guess is almost any technology startup that doesn’t accelerate things, but has reasonable people at the helm, can achieve.
When we are talking more about “how to make a safety-focused company that is substantially positive on safety?”, that is a very different question in my mind.

habryka Apr 19, 2025, 7:27 PM
13 points
6
on: Why Have Sentence Lengths Decreased?
Promoted to curated: I don’t think this post is earth-shattering, but it’s good, short, and answers an interesting question, and does so with a reasonable methodology and curiosity. And it’s not about AI, for once, which is a nice change of pace from our curation schedule these days.

habryka Apr 19, 2025, 6:52 PM
3 points
6
in reply to: Joseph Miller’s comment on: What Makes an AI Startup “Net Positive” for Safety?
I don’t think we should have norms or a culture that requires everything everyone does to be good specifically for AI Safety. Startups are mostly good because they produce economic value and solve problems for people. Sometimes they do so in a way that helps with AI Safety, sometimes not. I think Suno has helped with AI Safety because it has allowed us to make a dope album that made the rationalist culture better. Midjourney has helped us make LW better. But mostly, they just make the world better the same way most companies in history have made the world better.

habryka Apr 18, 2025, 10:34 PM
70 points
17
on: What Makes an AI Startup “Net Positive” for Safety?
I think almost all startups are really great! I think there really is a very small set of startups that end up harmful for the world, usually by specifically making a leveraged bet on trying to create very powerful AGI, or accelerating AI-driven R&D.
Because in some sense expecting a future with both of these technologies is what distinguishes our community from the rest of the world, if you end up optimizing for profit, leaning into exactly those technology then ends up a surprisingly common thing for people to do (as its where the alpha of our community relative to the rest of the world lies), which I do think is really bad.
As a concrete example, I don’t think Elicit is making the world much worse. I think its sign is not super obvious, but I don’t have a feeling they are accelerating timelines very much, or are making takeoff more sharp, or are exhausting political coordination goodwill. Similarly Midjourney I think is good for the world, so is Suno, so are basically all the AI art startups. I do think they might drive investment into the AI sector, and that might draw them into the red, but in as much as we want to have norms, and I give people advice on what to do, working on those things feels like it could definitely be net good.