Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.
Richard_Ngo
Ty! You’re right about the Asimov deal, though I do have some leeway. But I think the opening of this story is a little slow, so I’m not very excited about that being the only thing people see by default.
Unrelatedly, my last story is the only one of my stories that was left as a personal blog post (aside from the one about parties). Change of policy or oversight?
Ah, glad to hear the effort was noticeable. I do think that as I get more practice at being descriptive, concreteness will become easier for me (my brain just doesn’t work that way by default). And anyone reading this comment is welcome to leave me feedback about places in my stories where I should have been more concrete.
But I’m also pivoting away from stories in general right now, there’s too much other stuff I want to spend time on. I have half a dozen other stories for which I’ve already finished first drafts, so I’ll probably gradually release those in a low-effort way (i.e. without going through as much trouble to polish them). And then after that I expect I’ll only write the stories which feel easiest/most exciting to me, which tend to be the most abstract ones. So yeah, this is probably an outlier.
I wrote most of it a little over a year ago. In general I don’t plot out stories, I just start writing them and see what happens. But since I was inspired by The Gentle Seduction I already had a broad idea of where it was going.
I then sent a draft to some friends for feedback. One friend left about 50 comments in places where I’d been too abstract or given a vague description, with each comment literally just saying “like what?”
This was extremely valuable feedback but almost broke my will to finish the story. It took me about a year to work through most of those comments and concretize the things she highlighted. Then around the end of that I sent it to a couple more people, including Xander at Asimov Press, who did another editing pass (mainly toning down some of the overwrought parts).
The Minority Faction
I’m not sure what the details would look like, but I’m pretty sure ASI would have enough new technologies to figure something out within 10,000 years.
I feel like this is the main load-bearing claim underlying the post, but it’s barely argued for.
In some sense the sun is already “eating itself” by doing a fusion reaction, which will last for billions more years. So you’re claiming that AI could eat the sun (at least) six orders of magnitude faster, which is not obvious to me.
I don’t think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that.
FWIW twitter search is ridiculously bad, it’s often better to use google instead. In this case I had it as the second result when I googled “richardmcngo twitter safety fundamentals” (richardmcngo being my twitter handle).
Yepp, though note that this still feels in tension with the original post to me—I expect to find a clean, elegant replacement to VNM, not just a set of approximately-equally-compelling alternatives.
Why? Partly because of inside views which I can’t explain in brief. But mainly because that’s how conceptual progress works in general. There is basically always far more hidden beauty and order in the universe than people are able to conceive (because conceiving of it is nearly as hard as discovering it—like, before Darwin, people wouldn’t have been able to explain what type of theory could bring order to biology).
I read the OP (perhaps uncharitably) as coming from a perspective of historically taking VNM much too seriously, and in this post kinda floating the possibility “what if we took it less seriously?” (this is mostly not from things I know about Anna, but rather a read on how it’s written). And to that I’d say: yepp, take VNM less seriously, but not at the expense of taking the hidden order of the universe less seriously.
As a quick note: the auto-generated glossary for this story is pretty cool (though it predictably contains spoilers).
Because I might fund them or forward it to someone else who will.
In general people should feel free to DM me with pitches for this sort of thing.
I think this epistemic uncertainty is distinct from the type of “objective probabilities” I talk about in my post, and I don’t really know how to use language without referring to degrees of my epistemic uncertainty.
The part I was gesturing at wasn’t the “probably” but the “low measure” part.
Is your position that the problem is deeper than this, and there is no objective prior over worlds, it’s just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?
Yes, that’s a good summary of my position—except that I think that, like with ethics, there will be a bunch of highly-suggestive logical/mathematical facts which make it much more intuitive to choose some priors over others. So the choice of prior will be somewhat arbitrary but not totally arbitrary.
I don’t think this is a fully satisfactory position yet, it hasn’t really dissolved the confusion about why subjective anticipation feels so real, but it feels directionally correct.
Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. “probably very low measure”), which suggests that there’s some aspect of my response you don’t fully believe.
In particular, in order for your definition of “what beings are sufficiently similar to you” to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they’re in. But this is kinda what I mean by coalitional dynamics: a bunch of different copies of you become more central parts of the “coalition” of your identity based on e.g. the types of impact that they’re able to have on the world around them. I think describing this as a metric of similarity is going to be pretty confusing/misleading.
you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that
You still need a prior over worlds to calculate impacts, which is the cursed part.
I don’t think this line of argument is a good one. If there’s a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.
Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner.
In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.
Great post. One slightly nitpicky point, though: even in the section where you argue that probabilities are cursed, you are still talking in the language of probabilities (e.g. “my modal guess is that I’m in a solipsist simulation that is a fork of a bigger simulation”).
I think there’s probably a deeper ontological shift you can do to a mindset where there’s no actual ground truth about “where you are”. I think in order to do that you probably need to also go beyond “expected utilities are real”, because expected utilities need to be calculated by assigning credences to worlds and then multiplying them by expected impact in each world.
Instead the most “real” thing here I’d guess is something like “I am an agent in a superposition of being in many places in the multiverse. Each of my actions is a superposition of uncountable trillions of actions that will lead to nothing plus a few that will have lasting causal influence. The degree to which I care about one strand of causal influence over another is determined by the coalitional dynamics of my many subagents”.
FWIW I think this is roughly the perspective on the multiverse Yudkowsky lays out in Planecrash (especially in the bits near the end where Keltham and Carissa discuss anthropics). Except that the degrees of caring being determined by coalitional dynamics is more related to geometric rationality.
I also tweeted about something similar recently (inspired by your post).
Cool, ty for (characteristically) thoughtful engagement.
I am still intuitively skeptical about a bunch of your numbers but now it’s the sort of feeling which I would also have if you were just reasoning more clearly than me about this stuff (that is, people who reason more clearly tend to be able to notice ways that interventions could be surprisingly high-leverage in confusing domains).
Ty for the link but these seem like both clearly bad semantics (e.g. under either of these the second-best hypothesis under consideration might score arbitrarily badly).
Just changed the name to The Minority Coalition.
1. Yepp, seems reasonable. Though FYI I think of this less as some special meta argument, and more as the common-sense correction that almost everyone implicitly does when giving credences, and rationalists do less than most. (It’s a step towards applying outside view, though not fully “outside view”.)
2. Yepp, agreed, though I think the common-sense connotations of “if this became” or “this would have a big effect” are causal, especially in the context where we’re talking to the actors who are involved in making that change. (E.g. the non-causal interpretation of your claim feels somewhat analogous to if I said to you “I’ll be more optimistic about your health if you take these pills”, and so you take the pills, and then I say “well the pills do nothing but now I’m more optimistic, because you’re the sort of person who’s willing to listen to recommendations”. True, but it also undermines people’s willingness/incentive to listen to my claims about what would make the world better.)
3. Here are ten that affect AI risk as much one way or the other:
The US government “waking up” a couple of years earlier or later (one operationalization: AISIs existing or not right now).
The literal biggest names in the field of AI becoming focused on AI risk.
The fact that Anthropic managed to become a leading lab (and, relatedly, the fact that Meta and other highly safety-skeptical players are still behind).
Trump winning the election.
Elon doing all his Elon stuff (like founding x.AI, getting involved with Trump, etc).
The importance of transparency about frontier capabilities (I think of this one as more of a logical update that I know you’ve made).
o1-style reasoning as the next big breakthrough.
Takeoff speeds (whatever updates you’ve made in the last three years).
China’s trajectory of AI capabilities (whatever updates you’ve made about that in last 3 years).
China’s probability of invading Taiwain (whatever updates you’ve made about that in last 3 years).
And then I think in 3 years we’ll be able to publish a similar list of stuff that mostly we just hadn’t predicted or thought about before now.
I expect you’ll dispute a few of these; happy to concede the ones that are specifically about your updates if you disagree (unless you agree that you will probably update a bunch on them in the next 3 years).
But IMO the easiest way for safety cases to become the industry-standard thing is for AISI (or internal safety factions) to specifically demand it, and then the labs produce it, but kinda begrudgingly, and they don’t really take them seriously internally (or are literally not the sort of organizations that are capable of taking them seriously internally—e.g. due to too much bureaucracy). And that seems very much like the sort of change that’s comparable to or smaller than the things above.
I think I would be more sympathetic to your view if the claim were “if AI labs really reoriented themselves to take these AI safety cases as seriously as they take, say, being in the lead or making profit”. That would probably halve my P(doom), it’s just a very very strong criterion.
Oh huh, I had the opposite impression from when I published Tinker with you. Thanks for clarifying!