I have signed no contracts or agreements whose existence I cannot mention.
plex
Room to explore intellectual ideas is indeed important, as is not succumbing to peer pressure. However, Epoch’s culture has from personal experience felt more like a disgust reaction towards claims of short timelines than open curious engagement and trying to find out whether the person they’re talking to has good arguments (probably because most people who believe in short timelines don’t actually have the predictive patterns and Epoch mixed up “many people think x for bad reasons” with “x is wrong / no one believes it for good reasons”).
Intellectual diversity is a good sign, it’s true, but being closed to arguments by people who turned out to have better models than you is not virtuous.
From my vantage point, Epoch is wrong about critical things they are studying in ways that make them take actions which harm the future despite competence and positive intentions, while not effectively seeking out clarity which would let them update.
Long timelines is the main one, but also low p(doom), low probability on the more serious forms of RSI which seem both likely and very dangerous, and relatedly not being focused on misalignment/power-seeking risks to the extent that seems correct given how strong a filter that is on timelines with our current alignment technology. I’m sure not all epoch people have these issues, and hope that with the less careful ones leaving the rest will have more reliably good effects on the future.
Link to the OpenAI scandal. Epoch has for some time felt like it was staffed by highly competent people who were tied to incorrect conclusions, but whose competence lead them to some useful outputs alongside the mildly harmful ones. I hope that the remaining people take more care in future hires, and that grantmakers update off of accidentality creating another capabilities org.
ALLFED seems to be doing important and neglected work. Even if you just care about reducing AI x-risk, there’s a solid case to be made that preparing for alignment researchers to survive a nuclear war might be one of the remaining ways to thread the needle assuming high p(doom). ALLFED seems the group with the clearest comparative advantage to make this happen.
I vouch for Severin being highly skilled at mediating conflicts.
Also, smouldering conflicts provide more drag to cohesion execution than most people realize until it’s resolved. Try this out if you have even a slight suspicion it might help.
Accurate, and one of the main reasons why most current alignment efforts will fall apart with future systems. A generalized version of this combined with convergent power-seeking of learned patterns looks like the core mechanism of doom.
Podcast version:
The new Moore’s Law for AI Agents (aka More’s Law) has accelerated at around the time people in research roles started to talk a lot more about getting value from AI coding assistants. AI accelerating AI research seems like the obvious interpretation, and if true, the new exponential is here to stay. This gets us to 8 hour AIs in ~March 2026, and 1 month AIs around mid 2027.[1]
I do not expect humanity to retain relevant steering power for long in a world with one-month AIs. If we haven’t solved alignment, either iteratively or once-and-for-all[2], it’s looking like game over unless civilization ends up tripping over its shoelaces and we’ve prepared.
- ^
An extra speed-up of the curve could well happen, for example with [obvious capability idea, nonetheless redacted to reduce speed of memetic spread].
- ^
From my bird’s eye view of the field, having at least read the abstracts of a few papers from most organizations in the space, I would be quite surprised if we had what it takes to solve alignment in the time that graph gives us. There’s not enough people, and they’re mostly not working on things which are even trying to align a superintelligence.
- ^
Nice! I think you might find my draft on Dynamics of Healthy Systems: Control vs Opening relevant to these explorations, feel free to skim as it’s longer than ideal (hence unpublished, despite containing what feels like a general and important insight that applies to agency at many scales). I plan to write a cleaner one sometime, but for now it’s claude-assisted writing up my ideas, so it’s about 2-3x more wordy than it should be.
Interesting, yes. I think I see, and I think I disagree with this extreme formulation, despite knowing that this is remarkably often a good direction to go in. If “[if and only if]” was replaced with “especially”, I would agree, as I think the continual/regular release process is an amplifier on progress not a full requisite.
As for re-forming, yes, I do expect there is a true pattern we are within, which can be in its full specification known, though all the consequences of that specification would only fit into a universe. I think having fluidity on as many layers of ontology as you can is generally correct (and that most people have way too little of this), but I expect the process of release and dissolve will increasingly converge, if you’re doing well at it.
In the spirit of gently poking at your process: My uncertain, please take it lightly, guess is that you’ve annealed strongly towards the release/dissolve process itself, to the extent that it itself is an ontology which has some level of fixedness in you.
I’d love to see the reading time listed on the frontpage. That would make the incentives naturally slide towards shorter posts, as more people would click and it would get more karma. Feels much more decision relevant than when the post was posted.
Yup, DMing for context!
hmmm, I’m wondering if you’re pointing at something different from the thing in this space which I intuitively expect is good using words that sound more extreme than I’d use, or whether you’re pointing at a different thing. I’ll take a shot at describing the thing I’d be happy with of this type and you can let me know whether this feels like the thing you’re trying to point to:
An ontology restricts the shape of thought by being of a set shape. All of them are insufficient, the Tao that can be specified is not the true Tao, but each can contain patterns that are useful if you let them dissolve and continually release the meta-structures rather than cling to them as a whole. By continually releasing as much of your structure back to flow you grow much faster and in more directions, because in returning from that dissolving you reform with much more of your collected patterns integrated and get out of some of your local minima.
you could engage with the Survival and Flourishing Fund
Yeah! The S-process is pretty neat, buying into that might be a great idea once you’re ready to donate more.
Oh, yup, thanks, fixed.
Consider reaching out to Rob Miles.
He tends to get far more emails than he can handle so a cold contact might not work, but I can bump this up his list if you’re interested.
Firstly: Nice, glad to have another competent and well-resourced person on-board. Welcome to the effort.
I suggest: Take some time to form reasonably deep models of the landscape, first technical[1] and then the major actors and how they’re interfacing with the challenge.[2] This will inform your strategy going forward. Most people, even people who are full time in AI safety, seem to not have super deep models (so don’t let yourself be socially-memetically tugged by people who don’t have clear models).
Being independently wealthy in this field is awesome, as you’ll be able to work on whatever your inner compass points to as the best, rather than needing to track grantmaker wants and all of the accompanying stress. With that level of income you’d also be able to be one of the top handful of grantmakers in the field if you wanted, the AISafety.com donation guide has a bunch of relevant info (though might need an update sweep, feel free to ping me with questions on this).
Things look pretty bad in many directions, but it’s not over yet and the space of possible actions is vast. Best of skill finding good ones!
- ^
I recommend https://agentfoundations.study/, and much of https://www.aisafety.com/stay-informed, and chewing on the ideas until they’re clear enough in your mind that you can easily get them across to almost anyone. This is good practice internally as well as good for the world. The Sequences are also excellent grounding for the type of thinking needed in this field—it’s what they were designed for. Start with the highlights, maybe go on to the rest if it feels valuable. AI Safety Fundamentals courses are also worth taking, but you’ll want a lot of additional reading and thinking on top of that. I’d also be up for a call or two if you like, I’ve been doing the self-fund (+sometimes giving grants) and try and save the world thing for some time now.
- ^
Technical first seems best, as it’s the grounding which underpins what would be needed in governance, and will help you orient better than going straight to governance I suspect.
- ^
eh, <5%? More that we might be able to get the AIs to do most of the heavy lifting of figuring this out, but that’s a sliding scale of how much oversight the automated research systems need to not end up in wrong places.
My current guess as to Anthropic’s effect:
0-8 months shorter timelines[1]
Much better chances of a good end in worlds where superalignment doesn’t require strong technical philosophy[2] (but I put very low odds on being in this world)
Somewhat better chances of a good end in worlds where superalignment does require strong technical philosophy[3]
- ^
Shorter due to:
There being a number of people who might otherwise not have been willing to work for a scaling lab, or not do so as enthusiastically/effectively (~55% weight)
Encouraging race dynamics (~30%)
Making it less likely that there’s a broad alliance against scaling labs (15%)
Partly counterbalanced by encouraging better infosec practices and being more encouraging of regulation than the alternatives.
- ^
They’re trying a bunch of the things which if alignment is easy, might actually work, and no other org has the level of leadership buy in for investing in as hard.
- ^
Probably though using AI assisted alignment schemes, but building org competence in doing this kind of research manually so they can direct the systems to the right problems and sort slop from sound ideas is going to need to be a priority.
By “discard”, do you mean remove specifically the fixed-ness in your ontology such that the cognition as a whole can move fluidly and the aspects of those models which don’t integrate with your wider system can dissolve, as opposed to the alternate interpretation where “discard” means actively root out and try and remove the concept itself (rather than the fixed-ness of it)?
(also 👋, long time no see, glad you’re doing well)
*nods*, yeah, your team does seem competent and truth-seeking enough to get a lot of stuff right, despite what I model as shortcomings.
That experience was an in-person conversation with Jaime some years ago, after an offhand comment I made expecting fairly short timelines. I imagine there are many contexts where Epoch has not had this vibe.