owencb

Karma: 2,239

owencb Jul 11, 2025, 4:27 PM
4 points
0
in reply to: Richard_Ngo’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
No competition open right now, but I think there’s an audience (myself included) for more good thoughts on this topic, if you have something that feels like it might be worth sharing.

owencb Jul 1, 2025, 7:15 PM
8 points
3
in reply to: johnswentworth’s comment on: Embedded Altruism [slides]
Of course I’m into trying to understand things better (and that’s a good slice of what I recommend!), but:
- You need to make decisions in the interim
- There is a bunch of detail that won’t be captured by whatever your high level models are (like what will be the impacts of wording an email this way versus that)
- I think that for complete decisions you’d have a model of the whole future unfolding of civilization, and this is hard enough that we’re not going to do it with “a few years of study”

owencb Apr 16, 2025, 9:04 PM
3 points
0
in reply to: benwr’s comment on: Not all capabilities will be created equal: focus on strategically superhuman agents
It seems fine to me to have the goalposts moving, but then I think it’s important to trace through the implications of that.
Like, if the goalposts can move then this seems like perhaps the most obvious way out of the predicament; to keep the goalposts ever ahead of AI capabilities. But when I read your post I get the vibe that you’re not imagining this as a possibility?

owencb Apr 12, 2025, 9:59 PM
5 points
0
on: Not all capabilities will be created equal: focus on strategically superhuman agents
If we are going to build these agents without “losing the game”, either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there’s a day when an AI agent is created without either of these conditions, that’s the day I’d consider humanity to have lost.
Something seems funny to me here.
It might be to do with the boundaries of your definition. If humans agents are getting empowered by strategically superhuman (in an everyday sense) AI systems (agentic or otherwise), perhaps that raises the bar for what counts as superhuman for the purposes of this post? If so I think the argument would make sense to me, but it feels a bit funny to me to have this definition which is such a moving goalpost, and also might never get crossed even as AI gets arbitrarily powerful.
Alternatively, it might be that your definition is kind of an everyday one, but in that case your conclusion seems pretty surprising. Like it seems easy to me to imagine worlds where there are some agents without either of those conditions, but that they’re not better than the empowered humans.
Or perhaps something else is going on. Just trying to voice my confusions.
I do appreciate the attempt to analyse which kinds of capabilities are actually crucial.

owencb Nov 18, 2024, 8:27 PM
2 points
0
in reply to: Gordon Seidoh Worley’s comment on: The Choice Transition
It’s been a long time since I read those books, but if I’m remembering roughly right: Asimov seems to describe a world where choice is in a finely balanced equilibrium with other forces (I’m inclined to think: implausibly so—if it could manage this level of control at great distances in time, one would think that it could manage to exert more effective control over things at somewhat less distance).

owencb Sep 22, 2024, 11:19 AM
5 points
0
in reply to: sweenesm’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
I’ve now sent emails contacting all of the prize-winners.

owencb Sep 14, 2024, 1:06 PM
2 points
0
in reply to: owencb’s comment on: AI, centralization, and the One Ring
Actually, on 1) I think that these consequentialist reasons are properly just covered by the later sections. That section is about reasons it’s maybe bad to make the One Ring, ~regardless of the later consequences. So it makes sense to emphasise the non-consequentialist reasons.

I think there could still be some consequentialist analogue of those reasons, but they would be more esoteric, maybe something like decision-theoretic, or appealing to how we might want to be treated by future AI systems that gain ascendancy.

owencb Sep 14, 2024, 11:38 AM
2 points
0
in reply to: Wei Dai’s comment on: AI, centralization, and the One Ring
1. Yeah. As well as another consequentialist argument, which is just that it will be bad for other people to be dominated. Somehow the arguments feel less natively consequentialist, and so it seems somehow easier to hold them in these other frames, and then translate them into consequentialist ontology if that’s relevant; but also it would be very reasonable to mention them in the footnote.
2. My first reaction was that I do mention the downsides. But I realise that that was a bit buried in the text, and I can see that that could be misleading about my overall view. I’ve now edited the second paragraph of the post to be more explicit about this. I appreciate the pushback.

owencb Sep 13, 2024, 8:29 PM
3 points
0
in reply to: habryka’s comment on: AI, centralization, and the One Ring
Ha, thanks!
(It was part of the reason. Normally I’d have made the effort to import, but here I felt a bit like maybe it was just slightly funny to post the one-sided thing, which nudged against linking rather than posting; and also I thought I’d take the opportunity to see experimentally whether it seemed to lead to less engagement. But those reasons were not overwhelming, and now that you’ve put the full text here I don’t find myself very tempted to remove it. :) )

owencb Sep 13, 2024, 2:54 PM
8 points
2
in reply to: sweenesm’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
The judging process should be complete in the next few days. I expect we’ll write to winners at the end of next week, although it’s possible that will be delayed. A public announcement of the winners is likely to be a few more weeks.

owencb Aug 15, 2024, 8:31 PM
4 points
0
in reply to: avturchin’s comment on: A computational complexity argument for many worlds
I don’t see why (1) says you should be very early. Isn’t the decrease in measure for each individual observer precisely outweighed by their increasing multitudes?

owencb Aug 15, 2024, 12:35 AM
7 points
3
in reply to: jessicata’s comment on: A computational complexity argument for many worlds
This kind of checks out to me. At least, I agree that it’s evidence against treating quantum computers as primitive that humans, despite living in a quantum world, find classical computers more natural.

I guess I feel more like I’m in a position of ignorance, though, and wouldn’t be shocked to find some argument that quantum has in some other a priori sense a deep naturalness which other niche physics theories lack.

owencb Aug 14, 2024, 7:47 AM
2 points
0
on: A computational complexity argument for many worlds
You say that quantum computers are more complex to specify, but is this a function of using a classical computer in the speed prior? I’m wondering if it could somehow be quantum all the way down.

owencb Jul 31, 2024, 10:27 AM
7 points
−1
in reply to: mako yass’s comment on: Twitter thread on open-source AI
It’s not obvious that open source leads to faster progress. Having high quality open source products reduces the incentives for private investment. I’m not sure in which regimes that will play out that it’s overall accelerationist, but I sort of guess that it will be decelerationist during an intense AI race (where the investments needed to push the frontier out are enormous and significantly profit-motivated).

owencb Jul 24, 2024, 11:20 PM
2 points
0
on: A framework for thinking about AI power-seeking
I like the framework.
Conceptual nit: why do you include inhibitions as a type of incentive? It seems to me more natural to group them with internal motivations than external incentives. (I understand that they sit in the same position in the argument as external incentives, but I guess I’m worried that lumping them together may somehow obscure things.)

owencb Jul 23, 2024, 9:54 PM
4 points
0
in reply to: Seth Herd’s comment on: Caring about excellence
I actually agree with quite a bit of this. (I nearly included a line about pursuing excellence in terms of time allocation, but — it seemed possibly-redundant with some of the other stuff on not making the perfect the enemy of the good, and I couldn’t quickly see how to fit it cleanly into the flow of the post, so I left it and moved on …)
I think it’s important to draw the distinction between perfection and excellence. Broadly speaking, I think people often put too much emphasis on perfection, and often not enough on excellence.
Maybe I shouldn’t have led with the “Anything worth doing, is worth doing right” quote. I do see that it’s closer to perfectionist than excellence-seeking, and I don’t literally agree with it. Though one thing I like about the quote is the corollary: “anything not worth doing right isn’t worth doing” — again something I don’t literally agree with, but something I think captures an important vibe.
I do think people in academia can fail to find the corners they should be cutting. But I also think that they write a lot of papers that (to a first approximation) just don’t matter. I think that academia would be a healthier place if people invested more in asking “what’s the important thing here?” and focusing on that, and not trying to write a paper at all until they thought they could write one with the potential to be excellent.

owencb Jul 10, 2024, 9:23 AM
2 points
0
in reply to: Chris_Leong’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
No, multi-author submissions are welcome! (There’s space to disclose this on the entry form.)

owencb Jun 21, 2024, 9:47 PM
2 points
0
in reply to: Orpheus16’s comment on: AI takeoff and nuclear war
Can you say more about why you believe this? At first glance, it seems to be like “fundamental instability” is much more tied to how AI development goes, so I would’ve expected it to be more tractable [among LW users].
Maybe “simpler” was the wrong choice of word. I didn’t really mean “more tractable”. I just meant “it’s kind of obvious what needs to happen (even if it’s very hard to get it to happen)”. Whereas with fundamental instability it’s more like it’s unclear if it’s actually a very overdetermined fundamental instability, or what exactly could nudge it to a part of scenario space with stable possibilities.
In a post-catastrophe world, it seems quite plausible to me that the rebounding civilizations would fear existential catastrophes and dangerous technologies and try hard to avoid technology-induced catastrophes.
I agree that it’s hard to reason about this stuff so I’m not super confident in anything. However, my inside view is that this story seems plausible if the catastrophe seems like it was basically an accident, but less plausible for nuclear war. Somewhat more plausible is that rebounding civilizations would create a meaningful world government to avoid repeating history.

owencb Jun 21, 2024, 11:00 AM
8 points
0
on: owencb’s Shortform
Just a prompt to say that if you’ve been kicking around an idea of possible relevance to the essay competition on the automation of wisdom and philosophy, now might be the moment to consider writing it up—entries are due in three weeks.

owencb Jun 15, 2024, 10:03 PM
6 points
2
in reply to: Bird Concept’s comment on: The Leopold Model: Analysis and Reactions
My take is that in most cases it’s probably good to discuss publicly (but I wouldn’t be shocked to become convinced otherwise).
The main plausible reason I see for it potentially being bad is if it were drawing attention to a destabilizing technology that otherwise might not be discovered. But I imagine most thoughts are kind of going to be chasing through the implications of obvious ideas. And I think that in general having the basic strategic situation be closer to common knowledge is likely to reduce the risk of war.
(You might think the discussion could also have impacts on the amount of energy going into racing, but that seems pretty unlikely to me?)