learn math or hardware
mesaoptimizer
Project proposal: EpochAI for compute oversight
Detailed MVP description: website with an interactive map that shows locations of high risk data centers globally, with relevant information appearing when you click on the icons on the map. Examples of relevant information: organizations and frontier labs that have access to this compute, the effective FLOPS of the data center, what time would it take to train a SOTA model in that datacenter).
High risk datacenters are datacenters that are capable of training current or next generation SOTA AI systems.
Why:
I’m unable to find a ‘single point of reference’ for information about the number and locations of datacenters that are high risk.
AFAICT Epoch focuses more on tracking SOTA model details instead of hardware related information.
This seems extremely useful for our community (and policy makers) to orient to compute regulation possibilities and its relative prioritization compared to other interventions
Thoughts? I’ve been playing around with the idea of building it, but have been uncertain about how useful this would be, since I don’t have enough interaction with the AI alignment policy people here. Posting it here is an easy test to see whether it is worth greater investment or prioritization.
Note: Uncertain as to whether dual-use issues exist here. I expect that datacenter builders and frontier labs probably have a very good model of the global compute distribution situation and this would significantly benefit regulatory efforts compared to helping increase the strategic allocation of training compute allocation.
Neuro-sama is a limited scaffolded agent that livestreams on Twitch, optimized for viewer engagement (so it speaks via TTS, it can play video games, etc.).
Well, at least a subset of the sequence focuses on this. I read the first two essays and was pessimistic of the titular approach enough that I moved on.
Here’s a relevant quote from the first essay in the sequence:
Furthermore, most of our focus will be on ensuring that your model is attempting to predict the right thing. That’s a very important thing almost regardless of your model’s actual capability level. As a simple example, in the same way that you probably shouldn’t trust a human who was doing their best to mimic what a malign superintelligence would do, you probably shouldn’t trust a human-level AI attempting to do that either, even if that AI (like the human) isn’t actually superintelligent.
Also, I don’t recommend reading the entire sequence, if that was an implicit question you were asking. It was more of a “Hey, if you are interested in this scenario fleshed out in significantly greater rigor, you’d like to take a look at this sequence!”
Evan Hubinger’s Conditioning Predictive Models sequence describes this scenario in detail.
There’s generally a cost to managing people and onboarding newcomers, and I expect that offering to volunteer for free is usually a negative signal, since it implies that there’s a lot more work than usual that would need to be done to onboard this particular newcomer.
Have you experienced otherwise? I’d love to hear some specifics as to why you feel this way.
I think we’ll have bigger problems than just solving the alignment problem, if we have a global thermonuclear war that is impactful enough to not only break the compute supply and improvement trends, but also destabilize the economy and geopolitical situation enough that frontier labs aren’t able to continue experimenting to find algorithmic improvements.
Agent foundations research seems robust to such supply chain issues, but I’d argue that gigantic parts of the (non-academic, non-DeepMind specific) conceptual alignment research ecosystem is extremely dependent on a stable and relatively resource-abundant civilization: LW, EA organizations, EA funding, individual researchers having the slack to do research, ability to communicate with each other and build on each other’s research, etc. Taking a group of researchers and isolating them in some nuclear-war-resistant country is unlikely to lead to an increase in marginal research progress in that scenario.
Thiel has historically expressed disbelief about AI doom, and has been more focused on trying to prevent civilizational decline. From my perspective, it is more likely that he’d fund an organization founded by people with accelerationist credentials, than by someone who was a part of a failed coup attempt that would look to him like it involved a sincere belief in an extreme difficulty of the alignment problem.
I’d love to read an elaboration of your perspective on this, with concrete examples, which avoids focusing on the usual things you disagree about (pivotal acts vs. pivotal processes, social facets of the game is important for us to track, etc.) and mainly focus on your thoughts on epistemology and rationality and how it deviates from what you consider the LW norm.
I started reading your meta-rationality sequence, but it ended after just two posts without going into details.
David Chapman’s website seems like the standard reference for what the post-rationalists call “metarationality”. (I haven’t read much of it, but the little I read made me somewhat unenthusiastic about continuing).
Note that the current power differential between evals labs and frontier labs is such that I don’t expect evals labs have the slack to simply state that a frontier model failed their evals.
You’d need regulation with serious teeth and competent ‘bloodhound’ regulators watching the space like a hawk, for such a possibility to occur.
I just encountered polyvagal theory and I share your enthusiasm for how useful it is for modeling other people and oneself.
Note that I’m waiting for the entire sequence to be published before I read it (past the first post), so here’s a heads up that I’m looking forward to seeing more of this sequence!
I think Twitter systematically underpromotes tweets with links external to the Twitter platform, so reposting isn’t a viable strategy.
Thanks for the link. I believe I read it a while ago, but it is useful to reread it from my current perspective.
trying to ensure that AIs will be philosophically competent
I think such scenarios are plausible: I know some people argue that certain decision theory problems cannot be safely delegated to AI systems, but if we as humans can work on these problems safely, I expect that we could probably build systems that are about as safe (by crippling their ability to establish subjunctive dependence) but are also significantly more competent at philosophical progress than we are.
Leopold’s interview with Dwarkesh is a very useful source of what’s going on in his mind.
What happened to his concerns over safety, I wonder?
He doesn’t believe in a ‘sharp left turn’, which means he doesn’t consider general intelligence to be a discontinuous (latent) capability spike such that alignment becomes significantly more difficult after it occurs. To him, alignment is simply a somewhat harder empirical techniques problem like capabilities work is. I assume he imagines in behavior similar to current RLHF-ed models even as frontier labs have doubled or quadrupled the OOMs of optimization power applied to the creation of SOTA models.
He models (incrementalist) alignment research as “dual use”, and therefore effectively models capabilities and alignment as effectively the same measure.
He also expects humans to continue to exist once certain communities of humans achieve ASI, and imagines that the future will be ‘wild’. This is a very rare and strange model to have.
He is quite hawkish—he is incredibly focused on China not stealing AGI capabilities, and believes that private labs are going to be too incompetent to defend against Chinese infiltration. He prefers that the USGOV would take over the AGI development such that they can race effectively against AGI.
His model for take-off relies quite heavily on “trust the trendline” and estimating linear intelligence increases with more OOMs of optimization power (linear with respect to human intelligence growth from childhood to adulthood). Its not the best way to extrapolate what will happen, but it is a sensible concrete model he can use to talk to normal people and sound confident and not vague—a key skill if you are an investor, and an especially key skill for someone trying to make it in the SF scene. (Note he clearly states in the interview that he’s describing his modal model for how things will go and he does have uncertainty over how things will occur, but desires to be concrete about what is his modal expectation.)
He has claimed that running a VC firm means he can essentially run it as a “think tank” too, focused on better modeling (and perhaps influencing) the AGI ecosystem. Given his desire for a hyper-militarization of AGI research, it makes sense that he’d try to steer things in this direction using the money and influence he will have and build, as a founder of n investment firm.
So in summary, he isn’t concerned about safety because he prices it in as something about as difficult (or slightly more difficult than) capabilities work. This puts him in an ideal epistemic position to run a VC firm for AGI labs, since his optimism is what persuades investors to provide him money since they expect him to attempt to return them a profit.
Oh, by that I meant something like “yeah I really think it is not a good idea to focus on an AI arms race”. See also Slack matters more than any other outcome.
If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don’t understand why you’d want to play the AI arms race—you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.
Unsee the frontier lab.
These are pretty sane takes (conditional on my model of Thomas Kwa of course), and I don’t understand why people have downvoted this comment. Here’s an attempt to unravel my thoughts and potential disagreements with your claims.
AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away.
I think safety work gets less and less valuable at crunch time actually. I think you have this Paul Christiano-like model of getting a prototypical AGI and dissecting it and figuring out how it works—I think it is unlikely that any individual frontier lab would perceive itself to have the slack to do so. Any potential “dissection” tools will need to be developed beforehand, such as scalable interpretability tools (SAEs seem like rudimentary examples of this). The problem with “prosaic alignment” IMO is that a lot of this relies on a significant amount of schlep—a lot of empirical work, a lot of fucking around. That’s probably why, according to the MATS team, frontier labs have a high demand for “iterators”—their strategy involves having a lot of ideas about stuff that might work, and without a theoretical framework underlying their search path, a lot of things they do would look like trying things out.
I expect that once you get AI researcher level systems, the die is cast. Whatever prosaic alignment and control measures you’ve figured out, you’ll now be using that in an attempt to play this breakneck game of getting useful work out of a potentially misaligned AI ecosystem, that would also be modifying itself to improve its capabilities (because that is the point of AI researchers). (Sure, its easier to test for capability improvements. That doesn’t mean you can’t transfer information embedded into these proposals such that modified models will be modified in ways the humans did not anticipate or would not want if they had a full understanding of what is going on.)
Mentorship for safety is still limited. If you can get an industry safety job or get into MATS, this seems better than some random AI job, but most people can’t.
Yeah—I think most “random AI jobs” are significantly worse for trying to do useful work in comparison to just doing things by yourself or with some other independent ML researchers. If you aren’t in a position to do this, however, it does make sense to optimize for a convenient low-cognitive-effort set of tasks that provides you the social, financial and/or structural support that will benefit you, and perhaps look into AI safety stuff as a hobby.
I agree that mentorship is a fundamental bottleneck to building mature alignment researchers. This is unfortunate, but it is the reality we have.
Funding is also limited in the current environment. I think most people cannot get funding to work on alignment if they tried? This is fairly cruxy and I’m not sure of it, so someone should correct me if I’m wrong.
Yeah, post-FTX, I believe that funding is limited enough that you have to be consciously optimizing for getting funding (as an EA-affiliated organization, or as an independent alignment researcher). Particularly for new conceptual alignment researchers, I expect that funding is drastically limited since funding organizations seem to explicitly prioritize funding grantees who will work on OpenPhil-endorsed (or to a certain extent, existing but not necessarily OpenPhil-endorsed) agendas. This includes stuff like evals.
The relative impact of working on capabilities is smaller than working on alignment—there are still 10x as many people doing capabilities as alignment, so unless returns don’t diminish or you are doing something unusually harmful, you can work for 1 year on capabilities and 1 year on alignment and gain 10x.
This is a very Paul Christiano-like argument—yeah sure the math makes sense, but I feel averse to agreeing with this because it seems like you may be abstracting away significant parts of reality and throwing away valuable information we already have.
Anyway, yeah I agree with your sentiment. It seems fine to work on non-SOTA AI / ML / LLM stuff and I’d want people to do so such that they live a good life. I’d rather they didn’t throw themselves into the gauntlet of “AI safety” and get chewed up and spit out by an incompetent ecosystem.
Safety could get even more crowded, which would make upskilling to work on safety net negative. This should be a significant concern, but I think most people can skill up faster than this.
I still don’t understand what causal model would produce this prediction. Here’s mine: One big limiting factor to the amount of safety researchers the current SOTA lab ecosystem can handle is bottlenecked by their expectations for how many researchers they want or need. On one hand, more schlep during pre-AI-researcher-era means more hires. On the other hand, more hires requires more research managers or managerial experience. Anecdotally, it seems like many AI capabilities and alignment organizations (both in the EA space and in the frontier lab space) seemed to have been historically bottlenecked on management capacity. Additionally, hiring has a cost (both the search process and the onboarding), and it is likely that as labs get closer to creating AI researchers, they’d believe that the opportunity cost of hiring continues to increase.
Skills useful in capabilities are useful for alignment, and if you’re careful about what job you take there isn’t much more skill penalty in transferring them than, say, switching from vision model research to language model research.
Nah, I found very little stuff from my vision model research work (during my undergrad) contributed to my skill and intuition related to language model research work (again during my undergrad, both around 2021-2022). I mean, specific skills of programming and using PyTorch and debugging model issues and data processing and containerization—sure, but the opportunity cost is ridiculous when you could be actually working with LLMs directly and reading papers relevant to the game you want to play. High quality cognitive work is extremely valuable and spending it on irrelevant things like the specifics of diffusion models (for example) seems quite wasteful unless you really think this stuff is relevant.
Capabilities often has better feedback loops than alignment because you can see whether the thing works or not. Many prosaic alignment directions also have this property. Interpretability is getting there, but not quite. Other areas, especially in agent foundations, are significantly worse.
Yeah this makes sense for extreme newcomers. If someone can get a capabilities job, however, I think they are doing themselves a disservice by playing the easier game of capabilities work. Yes, you have better feedback loops than alignment research / implementation work. That’s like saying “Search for your keys under the streetlight because that’s where you can see the ground most clearly.” I’d want these people to start building the epistemological skills to thrive even with a lower intensity of feedback loops such that they can do alignment research work effectively.
And the best way to do that is to actually attempt to do alignment research, if you are in a position to do so.
If you like The Dream Machine, you’ll also like Organizing Genius.