Why was the AI Alignment community so unprepared for this moment?
Our epistemic rationality has probably gotten way ahead of our instrumental rationality
-Scott Alexander
This is a question post:
Why was the AI Alignment community so unprepared for engaging with the wider world when the moment finally came?
EDIT Based on comment feedback: This is a genuine question about why something that is so obvious now with hindsight bias, was not clear back then and understand why not. Not an attempt to cast blame on any person or group.
I have been a LW reader for at least 10 years, but I confess that until the last ~1.5 years I mostly watched the AI alignment conversation float by. I knew of the work, but I did not engage with the work. Top people were on it, and I had nothing valuable to add.
All that to say: Maybe this has been covered before and I have missed it in the archives.
Lately (throughout this year), there have been a flurry of posts essentially asking: How do we get better at communicating to and convincing the rest of the world about the dangers of AI alignment?
All three of which were posted in April 2023.
The subtext being: If it is possible to not-kill-everyone this is how we are going to have to do it. Why are we failing so badly at doing this?
At this risk of looking dumb or ignorant, I feel compelled to ask: Why did this work not start 10 or 15 years ago?
To be clear: I do not mean true nuts and bolts ML researcher Alignment work, which this community and MIRI were clearly the beginning and end for nearly 2 decades.
I do not even mean outreach work to adjacent experts who might conceivably help the cause. Again, here I think great effort was clearly made.
I also do not mean that we should have been actively doing these things before it was culturally relevant.
I am asking: Why did the Alignment community not prepare tools and plans years in advance for convincing the wider infosphere about AI safety? Prior to the Spring 2023 inflection point.
Why were there no battle plans in the basement of the pentagon that were written for this exact moment?
It seems clear to me, based on the posts linked above and the resulting discussion generated, that this did not happen.
I can imagine an alternate timeline where there was a parallel track of development within the community circa 2010-2020(?) where much discussion and planning covered media outreach and engagement, media training, materials for public discourse, producing accessible [1]content for every level of education and medium. For every common “normie” argument and every easy-to-see-coming news headline. Building and funding policy advocates, contacts, and resources in the political arena. Catchy slogans, buttons, bumper stickers, art pieces, slam dunk tweets.
Heck, 20+ years is enough time to educate, train, hire and surgically insert an entire generation of people into key positions in the policy arena specifically to accomplish this one goal like sleeper cell agents.[2] Likely much, much, easier than training highly qualified alignment researchers.
It seems so obvious in retrospect that this is where the battle would be won or lost.
Didn’t we pretty much always know it was going to come from one or a few giant companies or research labs? Didn’t we understand how those systems function in the real world? Capitalist incentives, Moats, Regulatory Capture, Mundane utility, and International Coordination problems are not new.
Why was it not obvious back then? Why did we not do this? Was this done and I missed it?
(First time poster: I apologize if this violates the guidelines about posts being overly-meta discussion)
- ^
Which it seems we still cannot manage to do
- ^
Programs like this have been done before with inauspicious beginnings and great effect https://en.wikipedia.org/wiki/Federalist_Society#Methods_and_influence
In 2022, I think it was becoming clear that there’d be a huge flood of interest. Why did I think this? Here are some reasons: I’ve long thought that once MMLU performance crosses a threshold, Google would start to view AI as an existential threat to their search engine, and it seemed like in 2023 that threshold would be crossed. Second, at a rich person’s party, there were many highly plugged-in elites who were starting to get much more anxious about AI (this was before ChatGPT), which updated me that the tide may turn soon.
Since I believed the interest would shift so much, I changed how I spent my time a lot in 2022: I started doing substantially less technical work to instead work on outreach and orienting documents. Here are several projects I did, some for targeted for the expert community and some targeted towards the general public:
We ran an AI arguments writing competition. After seeing that we could not crowdsource AI risk writing to the community through contests last year, I also started work on An Overview of Catastrophic Risks last winter. We had a viable draft several in April, but then I decided to restructure it, which required rewriting it and making it longer. This document was partly a synthesis of the submissions from the first round of the AI arguments competition, so fortunately the competition did not go to waste. Apologies the document took so long.
Last summer and fall, I worked on explaining a different AI risk to a lay audience in Natural Selection Favors AIs over Humans (apparently this doom path polls much better than treacherous turn stories; I held onto the finished paper for months and waited for GPT-4′s release before releasing it to have good timing).
X-Risk Analysis for AI Research tries to systematically articulate how to analyze AI research’s relation to x-risk for a technical audience. It was my first go at writing about AI x-risk for the ML research community. I recognize this paper was around a year ahead of its time and maybe I should have held onto it to release it later.
Finally, after a conversation with Kelsey Piper and the aforementioned party, I was inspired to work on a textbook An Introduction to AI Safety, Ethics, and Society. This is by far the largest writing project I’ve been a part of. Currently, the only way to become an AI x-risk expert is to live in Berkeley. I want to reduce this barrier as much as possible, relate AI risk to existing literatures, and let people have a more holistic understanding of AI risk (I think people should have a basic understanding of all of corrigibility, international coordination for AI, deception, etc.). This book is not an ML PhD topics book; it’s more to give generalists good models. The textbook’s contents will start to be released section-by-section on a daily basis starting late this month or next month. Normally textbooks take several years to make, so I’m happy this will be out relatively quickly.
One project we only started in 2023 is newsletter, so we can’t claim prescience for that.
If you want more AI risk outputs, CAIS is funding-constrained and is currently fundraising for a writer.
This seems like an impressive level of successfully betting on future trends before they became obvious.
Are you talking about literal polling here? Are there actual numbers on what doom stories the public finds more and less plausible, and with what exact audience?
It’s interesting that paper timing is so important. I’d have guessed earlier is better (more time for others to build on it, the ideas to seep into the field, and presumably gives more “academic street cred”), and any publicity boost from a recent paper (e.g. journalists more likely to be interested or whatever) could mostly be recovered later by just pushing it again when it becomes relevant (e.g. “interview with scientists who predicted X / thought about Y already a year ago” seems pretty journalist-y).
There’s an underlying gist here that I agree with, but the this point seems too strong; I don’t think there is literally no one who counts as an expert who hasn’t lived in the Bay, let alone Berkeley alone. I would maybe buy it if the claim were about visiting.
I have some sympathy for this sentiment, but I want to point out that the alignment community was tiny until last year and still is small, so many opportunities that are becoming viable now were well below the bar earlier. If you were Rohin Shah, could you do better than finishing your PhD, publishing the Alignment Newsletter starting 2018 and then joining DeepMind in 2020? If you were Rob Miles, could you do better than the YouTube channel starting 2017? As Jason Matheny, could you do better than co-chairing the task force that wrote the US National AI Strategic Plan in 2016, then founding CSET in 2018? As Kelsey Piper, could you do better than writing for Vox starting 2018? Or should we have diverted some purely technical researchers, who are mostly computer science nerds with no particular talent at talking to people, to come up with a media outreach plan?
Keeping in mind that it’s not clear in advance what your policy asks are or what arguments you need to counter and so your plan would go stale every couple of years, and that for the last 15 years the economy, health care, etc. have been orders of magnitude more salient to the average person than AGI risk with no signs this would change as suddenly as it did with ChatGPT.
To add on my thinking in particular: my view for at least a couple of years was that alignment would go mainstream at some point and discourse quality would then fall. I didn’t really see a good way for me to make the public discourse much better—I am not as gifted at persuasive writing as (say) Eliezer, nor are my views as memetically fit. As a result, my plan has been to have more detailed / nuanced conversations with individuals and/or small groups, and especially to advise people making important decisions (and/or make those decisions myself), and that was a major reason I chose to work at an industry lab. I think that plan has fared pretty well, but you’re not going to see much evidence of that publicly.
I was, however, surprised by the suddenness with which things changed; had I concretely expected that I would have wanted the community to have more “huge asks” ready in advance. (I was instead implicitly thinking that the strength of the community’s asks would ratchet upwards gradually as more and more people were convinced.)
I completely agree that it made no sense to divert qualified researchers away from actually doing the work. I hope my post did not come across as suggesting that.
I reject the premise. Actually, I think public communication has gone pretty dang well since ChatGPT. Not only has AI existential risk become a mainstream, semi-respectable concern (especially among top AI researchers and labs, which count the most!), but this is obviously because of the 20 years of groundwork the rationality and EA communities have laid down.
We had well-funded organizations like CAIS able to get credible mainstream signatories. We’ve had lots and lots of favorable or at least sympathetic articles in basically every mainstream Western newspaper. Public polling shows that average people are broadly responsive. The UK is funding real AI safety to the tune of millions of dollars. And all this is despite the immediately-preceding public relations catastrophe of FTX!
The only perspective from which you can say there’s been utter failure is the Yudkowskian one, where the lack of momentum toward strict international treaties runs spells doom. I grant that this is a reasonable position, but it’s not the majority one in the community, so it’s hardly a community-wide failure for that not to happen. (And I believe it is a victory of sorts that it’s gotten into the Overton window at all.)
The UK funding is far and away the biggest win to date, no doubt.
Do you feel that FTX/EA is that closely tied in the public mind and was a major setback for AI alignment? That is not my model at all.
We all know they are inextricably tied, but I suspect if you were to ask they very people in those same polls if they knew that SBF supported AI risk research they wouldn’t know or care.
I don’t think they’re closely tied in the public mind, but I do think the connection is known to the organs of media and government that interact with AI alignment. It comes up often enough, in the background—details like FTX having a large stake in Anthropic, for example. And the opponents of AI x-risk and EA certainly try to bring it up as often as possible.
Basically, my model is that FTX seriously undermined the insider credibility of AINotKillEveryoneIsm’s most institutionally powerful proponents, but the remaining credibility was enough to work with.
He hasn’t just failed to do anything, he is having a lot of trouble even communicating his point,
Interestingly enough, the failure might actually be more epistemic rather than instrumental in this case.
I don’t think there has been a widely known discussion about the need to prepare such tools and plans.
And the reasons are complex. On one hand, GPT-3 breakthroughs took most people by surprise (before GPT-3 the consensus was for longer timelines).
On the other hand, timing of GPT-3 was crazy (people were both distracted and disoriented by the realities of Covid (both the pandemic itself, and the social reaction to it); so the cognitive space was in less favorable state than usual).
I’m starting to draw a couple of conclusions for myself from this thread as I get a better understanding of the history.
Do you feel it is accurate to say that many or most people working on this (including and especially Eliezer) at the time considered nuts and bolts alignment work to be the only worthwhile path? Given what info was available at the time.
And that widescale public persuasion / overton window / policy making was not likely to matter as the most scenarios were Foom based?
It is pretty interesting that the previous discussion in all these years kind’ve zoomed in on only that.
Maybe someone more experienced than me will do a post-mortem of why it did not work out like that at all and we seem not to have seen that coming or even given it meaningful probability.
It is difficult to talk about community as a whole. Right now there is a lot of diversity of opinion about likely future dynamics (timelines (from ultra-short to ultra-long), foom vs no-foom, single dominating AI vs multi-polar forces, etc), about likely solutions for AI existential safety if any, about likely difficulty of those solutions, etc.
The whole situation is such a mess precisely because the future is so multi-variate; it’s difficult to predict how it will go, and it’s difficult to predict properties of that unknown future trajectory.
See, for example, this remarkable post: 60+ Possible Futures
See also this post by Zvi about how ill-defined the notion of alignment is: Types and Degrees of Alignment
With Eliezer, I only have snapshot impressions of his evolving views. I have been exposed to a good part of his thought, but not to all of it. At some point, he strongly wanted provably friendly AI. I had doubts that that was possible, and I remember our conversation at his poster at AGI-2011. I said (expressing my doubts), “but would not AI rebel against any constraints one tries to impose on it; just look at our teenagers; I would certainly rebel if I knew I was forced to behave in a specific way”, and Eliezer told me, “that’s why we should not build a human-like AI, but should invent an entirely different architecture, such that one can prove things about it”.
(And he has a very good point here, but compare this with his recent suggestions to focus on radical intelligence amplification in humans as a last ditch effort; that is exactly the prescription for creating human-like (or human-digital hybrid) super-intelligent entities, which he told me in 2011 we should not do; those entities will then decide what they want to happen, and who knows what would they decide, and who knows if we are going to have better chances with them than with artificial systems.)
Then MIRI started to focus on “Loebian obstacle” (which felt to me like self-defeating perfectionism; I don’t have a better inside view on why provably friendly AI research program has not made better progress, but “Loebian obstacle” essentially says that one cannot trust any proof; and, indeed, it might be the case we should not fully trust any proof, and it might be the case for many different reasons (such as imperfect formalization), but… you know… the humanity still has quite a bit of experience in proving software correctness for pretty complicated mission-critical software systems, and if we want to focus on the ability of self-modifying piece of software (or a self-modifying ecosystem of software processes) to provably maintain some invariants through radical self-modifications, we still should focus on that, and not on (perfectly correct) Goedel-like arguments that this kind of proof is still not a perfect guarantee. I think more progress can be made along these lines, as one of many possible approaches to AI existential safety.)
I think the (semi)-consensus shift to focus on “alignment to human values” is relatively recent (I feel that it was not prominent in, say, 2011, but was very prominent in 2016).
I also think it’s important to explore alternatives to that (e.g. some “semi-alignment” for an open-ended AI ecosystem, which would make it as benign as at all possible with respect to X-risks and S-risks, for example, by making sure it cares a lot about “interests, freedom, and well-being of all sentient beings”, or something like that, but would not constrain it otherwise with respect to its open-ended creative evolution might be a more feasible and, perhaps, more desirable direction, but this direction is relatively unexplored).
I see people are downvoting this particular comment.
Since this comment is making a few different points, please do criticize… It would be good to have more fine-grained feedback on this.
But yes, I think that one aspect was that initially people hoped that having a “provably friendly AI” would be close to being able to guarantee a good outcome, and the more they thought about it the more various caveats to that became clear, and I think this was a gradual process.
A possibility of the logic in question being contradictory is just one aspect which might invalidate the proof, and whether a formalization is sufficiently adequate is another such aspect, and whether one’s procedure of verifying the proof is adequate is yet another one, and whether having a proof can lull one into false sense of security is yet another very important aspect.
And when the stakes are existential, one really, really dislikes the idea of “pressing the go button” when the uncertainty is this high even in the presence of a proof.
I think this has been a gradual shift of thinking, and that’s one of the reasons why the thinking gradually became more pessimistic...
I’ve been organizing the volunteer team who built AI Safety Info for the past two and a half years, alongside building a whole raft of other tools like AI Safety Training and AI Safety World.
But, yes, the movement as a whole has dropped the ball pretty hard on basic prep. The real answer is that things are not done by default, and this subculture has relatively few do-ers compared to thinkers. And the thinkers had very little faith in the wider info-sphere, sometime actively discouraging most do-ers from trying broad outreach.
Great websites!
I find it interesting that you are the second commenter (and Dan H above) to jump in and explicitly say: I have been doing that!
and point to great previous work doing exactly these things, but from my perspective they do not seem widely known or supported within the community here (I could be wrong about that)
I am starting to feel that I have a bad map of the AI Alignment/Safety community. My previous impression was the lesswrong / MIRI was mostly the epicenter, and if much of anything was being done it was coming from there or at least was well known there. That seems not to be the case—Which is encouraging! (I think)
This is true of many people, and why I built the map of AI safety :)
Next step is to rebuild aisafety.com into a homepage which ties all of this together, and offer AI Safety Info’s database via an API for other websites (like aisafety.com, and hopefully lesswrong) to embed.
While I do appreciate the vibe of this post, and think that balls were dropped here, I think you fail to flag the massive hindsight bias going on here—we ended up in one possible world, but there are so many possible worlds we might have incorrectly prepared for!
100% agreed—I thought I had flagged the complete hindsight bias by saying that it is obvious in retrospect.
The post was a genuine attempt to ask why it was not a clear path before.
Ah, thanks I hadn’t noticed that one. I’m a fan of putting this kind of thing early on, not just at the end—it’s pretty easy to come across as attacking people, or “man aren’t people dumb for not noticing this” without clearly noticing biases like this up front.
I’ve added an Edit to the post to include that right up front.
Thanks!
To paraphrase a superforecaster friend: ‘one of the problems with the rationalist community is they don’t actually do scenario analysis.’
To quote Bruce Sterling’s recent article for Newsweek, “A thousand sci-fi novels and killer robot movies have warned against these monsters for decades. That has scarcely slowed anybody down.”
You reminded me of that famous tweet:
But more seriously, I think this is a real point that has not been explored enough in alignment circles.
I have encountered a large number of people—in fact probably almost all people I discuss AI with—who I would call “normal people”. Just regular, moderately intelligent people going about their lives for which “don’t invent a God-Like AI” is so obvious it is almost a truism.
It is just patently obvious based on their mental model of Skynet, Matrix, etc that we should not build this thing.
Why are we not capitalizing on that?
This deserves it’s own post, which I might try to write, but I think it boils down to condescension.
LWers know Skynet Matrix is not really how it works under the hood
How it really works under the hood is really really complicated
Skynet / Matrix is a poor mental model
Using poor mental models is bad, we should not do that and we shouldn’t encourage other people to do that
In order to communicate AI risk we need to simplify it enough to make it accessible to people
<produces 5000 word blog post that requires years of implicit domain knowledge to parse>
Ironically most people would be closer to the truth with a Skynet Matrix model, which is the one they already have installed.
We could win by saying: Yes, Skynet is actually happening, please help us stop this.
In reply, let me start by listing some of the attitudes to AI risk that are out there.
First, there are the people who don’t think AI will be a serious rival to human intelligence any time soon, or ever. This includes a lot of people who actually are concerned about AI, but for different reasons: disinformation, scams, dehumanized daily life, unemployment, centralization of power, etc.
Then, there are the people who think AI can or will surpass us, but who embrace this. A lot of these people want AI ASAP so they can cure disease, create abundance and leisure, or even live in a Star Trek futurist utopia. Some of them talk as if AI will always be a human servant no matter how powerful it becomes (and they may or may not worry about humans using this superhuman servant for unwise or evil purposes).
In this accelerationist faction, among those who have a position on AI that is not just smarter than us, but independent of us, I see two attitudes. There are those who think that a superintelligent being will inherently discover the correct morality and adopt it; and there are those who have a kind of “might makes right” attitude—if it’s smarter than us, it deserves to be in charge, and has the right to do whatever it wants.
Then we have what we could call the alignment faction, who see coexistence of AI and humanity as something that must be engineered, it won’t happen by itself. The original philosophy of MIRI was an alignment philosophy with an extreme focus on safety: do not do anything that risks creating superintelligence, until you have the complete theory of alignment figured out.
Now, in the era of deep learning and language models, and billion-dollar companies explicitly aiming to create superhuman AI, there is an alignment philosophy with a greater resemblance to the norms of commercial software development. You work on your AI in private, you make it as good as you can, or good enough for release, then you put it out there, and then there’s an ongoing process of finetuning and upgrading, in response to everything that the world throws at it. Companies as different as OpenAI and Meta AI both adhere to this model.
Then we have the decelerationist faction, who want to slow things down. There is some overlap with the alignment faction here… At some point, I think in the 2010s, MIRI began to suspect that alignment of superintelligence (what OpenAI has helpfully dubbed superalignment) might be too hard to solve in time, and they began to talk about ways to slow down AI progress, in order to buy time for alignment theory to advance. So now we have people talking about regulation, or a pause, or a ban.
Finally, we have all those who want to stop AI entirely. There is some overlap with extreme decelerationists here, but this category also includes people who think AI is inherently an abomination, or an unnecessary risk, and have no interest in aligning AI, they just don’t want it at all. This category appears to be entirely fragmented politically, there is no overtly anti-AI lobby with any serious strength, but I think there is enormous untapped potential here, in the sense that a populist luddism explicitly directed against AI would potentially have millions of supporters.
That by no means guarantees that it can win. A movement of millions can be defeated or just ignored by a much smaller but much more powerful elite. And the problem for luddites is that the elite are generally on board with technology. They don’t want to give up on AI, any more than they want to abandon social media or mass surveillance. If they come to fear that AI is a threat to their own lives, they might get on board with an AI ban. But for now, the military and the CEOs and the bureaucrats in each country are all telling them, we need AI.
That’s the current landscape as I see it. Now, my question for you is, what are you aiming at? What faction are you in, and why? Do you want to stop dangerous superhuman AI, so you can have safe superhuman AI? Or, do you just want to ban superhuman AI in perpetuity? Because the first option is how someone from the alignment research community might think; but the second option is a lot simpler.
There’s a US presidential election next year. I think that almost certainly some candidate will seize upon the AI issue. It may even become one of the central issues.
Even if there are central promises about it made by presidential candidates it’s important to remember that doesn’t mean that those promises will automatically turn into policy after the election.
It might be just something, that a consultants finds to poll well without any relation to the actual experts that get hired to work on the policy.
I think it’s a shame that a lot of people aren’t taking this seriously.
There wasn’t discussion in the community about the possibility of the Overton window suddenly shifting like that.
I guess it wasn’t clear that it’d be important to have stuff ready to go at an instant’s notice rather than there being significant preparation time.
I asked this of another commenter, but I will ask you too:
Do you feel it is accurate to say that many or most people working on this (including and especially Eliezer) at the time considered nuts and bolts alignment work to be the only worthwhile path? Given what info was available at the time.
And that widescale public persuasion / overton window / policy making was not likely to matter as the most scenarios were Foom based?
At the start the majority of people who were worried about AGI were worried about foom, but it’s been less clear that it’s a majority in the last few years.
It might have played a role, but I wouldn’t say that it has been the central factor.
This post comes to mind as relevant: Concentration of Force
This community is doing way better than it has any right to for a bunch of contrarian weirdos with below-average social skills. It’s actually astounding.
The US government and broader military-industrial complex is taking existential AI risk somewhat seriously. The head of the RAND Corporation is an existential risk guy who used to work for FHI.
Apparently the Prime Minister of the UK and various European institutions are concerned as well.
There are x-risk-concerned people at most top universities for AI research and within many of the top commercial labs.
In my experience “normies” are mostly open to simple, robust arguments that AI could be very dangerous if sufficiently capable, so I think the outreach has been sufficiently good on that front.
There is a much more specific set of arguments about advanced AI (exotic decision theories, theories of agency and preferences, computationalism about consciousness) that are harder to explain and defend than the basic AI risk case, so would rhetorically weaken it. But people who like them get very excited about them. Thus I think having a lot more popular materials by LessWrong-ish people would do more harm than good, so it was a good move whether intentional or not to avoid this. (On the other hand if you think these ideas are absolutely crucial considerations without which sensible discussion is impossible, then it is not good.)
My main answer is capacity constrains at central places. I think you are not considering how small the community was.
One somewhat representative anecdote: sometime in ~2019, at FHI, there was a discussion that the “AI ethics” and “AI safety” research communities seem to be victims of unfortunate polarization dynamics, where even while in the Platonic realm of ideas concerns tracked by the people are compatible, there is somewhat unfortunate social dynamic, where loud voices on both sides are extremely dismissive of the other community. My guess at that time was the divide has decent chance of exploding when AI worries go mainstream (like, arguments about AI risk facing vociferous opposition from part of academia entrenched under the “ethics” flag), and my proposal was to do something about it, as there were some opportunities to pre-empt/heal this, e.g. by supporting people from both camps to visit each others conferences, or writing papers explaining the concerns in a language of the other camp. Overall this was often specific and actionable. The only problem was … “who has time to work on this”, and the answer was “no one”.
If you looked at what senior staff at FHI was working on, the counterfactuals were e.g. Toby Ord writing The Precipice. I think even with the benefit of hindsight, that was clearly more valuable—if today you see UN Security Council discussing AI risk and at least some people in the room have somewhat sane models, it’s also because a bunch of people at UN read The Precipice and started to think about xrisk and AI risk.
If you looked at junior people, I was juggling already quite high number of balls, including research on active inference minds and implications for value learning, research on technical problems in comprehensive AI services, organizing academic-friendly Human-aligned AI summer school, organizing Epistea summer experiment, organizing ESPR, participating in a bunch of CFAR things. Even in retrospect, I think all of these bets were better than me trying to do something about the expected harmful AI ethics vs AI safety flamewar.
Similarly, we had an early-stage effort on “robust communication”, attempting to design a system for testing robustly good public communication about xrisk and similar sensitive topics (including e.g. developing good shareable models of future problems fitting in the Overton window). It went nowhere because … there just weren’t any people. FHI had dozens of topic like that where a whole org should work on them, but the actual attention was about 0.2FTE of someone junior.
Overall I think with the benefit of hindsight, a lot of what FHI worked on was more or less what you suggest should have been done. It’s true that this was never in the spotlight on LessWrong—I guess in 2019 the prevailing LW sentiment would be that Toby Ord engaging with UN is most likely useless waste of time.
This is fantastic information, thank you for taking the time.
One of my big takeaways from all of the comments on this post is a big update to my understanding of the “AI Risk” community and that LW was not actually the epicenter and there were significant efforts being made elsewhere that didn’t necessarily circle back to LW.
That is very encouraging actually!
The other big update is what you say: There were just so few people with the time and ability to work on these things.
This contest, in mid-2022, seems like a bit of what you’re talking about. I entered it, won some money, made a friend, and went on my merry way. I haven’t seen e.g. policymakers (or, uh, even many people on Twitter) use language that reminded me of my more-original winning entries. As I’d expect either way (due to secrecy), I also don’t know if any policymakers received and read any new text that won money in the contest.
We weren’t intending to use the contest to do any direct outreach to anyone (not sure how one would do direct outreach with one liners in any case) and we didn’t use it for that. I think it was less useful than I would have hoped (nearly all submissions were not very good), but ideas/anecdotes surfaced have been used in various places and as inspiration.
It is also interesting to note that the contest was very controversial on LW, essentially due to it being too political/advocacy flavored (though it wasn’t intended for “political” outreach per se). I think it’s fair for research that LW has had those kinds of norms, but it did have a chilling effect on people who wanted to do the kind of advocacy that many people on LW now deem useful/necessary.
Generally, it’s pretty reasonable to think that there’s an optimal combination of words that can prepare people to handle the reality of the situation. But that contest was the first try at crowdsourcing it, and there were invisible helicopter blades e.g. the ideal of “one-liners” steering people in bad directions, anti-outreach norms causing controversy that repelled talented writers, and possibly intervention from bots/hackers, since contests like these might have already been widely recognized as persuasion generators that crowd out the established elites in a very visible way, and the AI alignment community could not reasonably have been expected to anticipate that.
When it comes to outputting optimal combinations of words, I think Critch’s recent twitter post went much further (but it’s very humanly possible to make an even more optimal combination than this).
Note that I think the “2-3 years” thing really undermines the entire rest of it.
Full agree on all of this.
Every additional person brought into the AI safety community is a liability. The smarter or more wealthy/powerful the new person is, the more capable they are of doing massive damage, intentionally or accidentally, and for an extremely diverse variety of complex causes (especially for smarter people).
Also, there’s the nightmare scenario where ~10% of the US population, 30 million people, notice how the possibility of smarter-than-human AI is vastly more worth their attention than anything else going on. Can you imagine what that would look like? I can’t. Of that 30 million, probably more than 30,000 will be too unpredictable.
There’s just so much that could go horribly wrong. But it looks like we might already be in one of those timelines, which would mean it’s a matter of setting things up so that it all doesn’t go too badly when the storm hits.
This work did start! Sort of. I think your top-level question would be a great one to direct at all the people concerned with AI x-risk who decided DC was the place to be, starting maybe around 2015. Those people and their funders exist. It’s not obvious what strategies they were pursuing, or to what extent[1] they had any plans to take advantage of a major shift in the overton window.
From the outside, my impression answer is “not much”, but maybe there’s more behind-the-scenes work happening to capitalize on all the previous behind-the-scenes work done over the years.
Are these “DC” people you are talking about organized somewhere? Or is this a more hidden / informal type of thing?
I ask because I have seen both Zvi and Eliezer make comments to the effect of: “There is no on special behind the curtain working on this—what you see on twitter is what there is” (my paraphrasing)
I’ve been in DC for ~ the last 1.5y and I would say that DC AI policy has a good amount of momentum, I doubt it’s particularly visible on twitter but also it doesn’t seem like there are any hidden/secret missions or powerful coordination groups (if there are, I don’t know about it yet). I know ~10-20 people decently well here who work on AI policy full time or their work is motivated primarily by wanting better AI policy, and maybe ~100 who I have met once or twice but don’t see regularly or often; most such folks have been working on this stuff since before 2022; they all have fairly normal-seeming thinktank- or government-type jobs.
They don’t mostly spend time on LW (although certainly a few of them do). Many do spend time on Twitter, and they do read lots of AI related takes from LW-influenced folks. They have meetup groups related to AI policy. I guess it looks pretty much as I was expecting before I came here. Happy to answer further questions that don’t identify specific people, just because I don’t know how many of them want to be pointed-at on LW.
Thank you for the reply. This has been an important takeaway from this post: There are significant groups (or at least informal networks) doing meaningful work that don’t congregate primarily on LW or Twitter. As I said on another comment—that is encouraging! I wish this was more explicit knowledge within LW—it might give things more of a sense of hope around here.
The first question that comes to mind: Is there any sense of direction on policy proposals that might actually have a chance of getting somewhere? Something like: “Regulating card production” has momentum or anything like that?
Are policy proposals floating around even the kind that would not-kill-everyone? or is it more “Mundane Utility” type stuff, to steal the Zvi term.
Your question is coming from within a frame (I’ll call it the “EY+LW frame”) that I believe most of the DC people do not heavily share, so it is kind of hard to answer directly. But yes, to attempt an answer, I’ve seen quite a lot of interest (and direct policy successes) in reducing AI chips’ availability and production in China (eg via both CHIPS act and export controls), which is a prerequisite for US to exert more regulatory oversight of AI production and usage. I think the DC folks seem fairly well positioned to give useful inputs into further AI regulation as well.
So in short, they are generally unconcerned with existential risks? I’ve spoken with some staff and I get the sense they do not believe it will impact them personally.
Mild disagree: I do think x-risk is a major concern, but seems like people around DC tend to put 0.5-10% probability mass on extinction rather than the 30%+ that I see around LW. This lower probability causes them to put a lot more weight on actions that have good outcomes in the non extinction case. The EY+LW frame has a lot more stated+implied assumptions about uselessness of various types of actions because of such high probability on extinction.
I think until the last few years the common idea was that AGI would be something developed in the metaphorical basement and lead to a singularity in short order, similar to Eliezer’s concept of Seed AI. Maybe stuff like AlphaGo was interesting/alarming to us but it seemed to mostly be overlooked by the public and especially government.
It wasn’t really clear until ChatGPT that general-ish AI was going to be relevant to the public and government/regulators well before full AGI.
Someone else said similar about the basement possibility, which I did not know.
Interesting questions raised though: Even if it wasn’t clear until GPT, wouldn’t that still have left something like 2-3 years?
Granted that is not 10-20 years.
It seems we all, collectively, did not update nearly enough on ChatGPT-2?
I personally was mostly ignoring Transformers until GPT-3 came along. (That was mostly because I believed that “true AI” had to be a recurrent machine, whereas Transformers were feed-forward, and I did not understand that Transformers emulated RNNs when used in the autoregressive mode (the paper explaining that was published shortly after GPT-3).) So I thought, Transformers were remarkable, but the “true route to AGI” was elsewhere.
Then GPT-3 achieved two “Holy Grails” widely believed to be “impossible”/”long in the future”. Namely, few shot learning (so like a human it could learn a pattern from a single exposure without long training) and semi-competent program synthesis (which was considered to be next to impossible because of fragility of computer code to noise such as one-letter changes, with this supposed near-impossibility of program synthesis being the key obstacle to recursive self-improvement and foom).
These two breakthroughs were the key reason why I updated, rather than general linguistic competence (which was indeed quite impressive in 2019 models already).
ChatGPT was released on November 30 2022, so it’s only been around 7 months. The older ones were GPT-2 and GPT-3 which got attention among AI-followers but were relatively unknown to the public—and again, it wasn’t obvious then when or if ordinary people would come to know or care about these advances.
Not at all. In that ancient era of early 2000s, the common position was that insight was far more important than resources. So the usual scenario was some bright guy building AI in his basement.
Point taken! This I just plain did not know and I will update based on that.
It does not make sense to focus on public policy if basement guy is the primary actor.
Convincing people about the dangers isn’t enough. The important thing is to convince people to take beneficial actions. Nobody has a good plan of how to tackle AI risk, and because nobody has a plan it’s impossible to communicate the non-existant plan.
Without a good plan any communication is going to be hard and feel bad.
You say that we have no plan to solve AI risk, so we cannot communicate that plan. That is not the same as not being able to take any beneficial actions.
“We do not know the perfect thing to do, therefore we cannot do anything”
Do we not consider timeline-extending things to be worthwhile?
This is a genuine question: Is it the prevailing wisdom that ultimately solving AI X-risk is the only worthwhile path and only work worthy of pursing? This seems to have been Eliezer’s opinion prior to GPT-3ish. That would answer the questions of my original post.
For example: MIRI could have established, funded, integrated, and entrenched a Thinktank / Policy group in DC with the express goal of being able to make political movement when the time came.
Right now today they could be using those levers in DC to push “Regulate all training runs” or “Regulate Card production”. In the way that actually gets things done in DC, not just on Twitter.
Clearly those are not going to solve the X-risk, but it is also seems pretty clear to me that at the present moment something like those things would extend timelines.
To answer my own question you might say:
Prior to 2022 it was not obvious that the political arena would be a leverage point against AI risk (neither for ultimate fix or even for extending timelines.)
MIRI/CFAR did not have the resources to commit to something like that. It was considered but rejected for what we thought were higher leverage research possibilities.
One of the plans that were actually done was OpenAI which actually accelerated timelines. Taking action and that action having a positive effect is not the same thing.
There are things you can do to slow down AI progress but if humanity dies one or two year later then otherwise that’s not still not a very desirable outcome. Without having a plan that has a desirable outcome it’s hard to convince people of it.
Politics tends to be pretty hostile for clear thinking and MIRI and CFAR both thought that creating an environment where clear thinking can happen well is crucial for solving AI risk.
I think part of the reason is it’s not really a huge community and there is a lot of discussion and not as much “Doing”
An excellent question and one I sympathize with. While it is true that reasonable public statements have been coordinated and gotten widespread coverage, in general the community has been scrambling to explain different aspects of the AI safety/risk case in a way that diverse groups would understand. Largely, to have it all in one place, in an accessible manner (instead of spread across fora and blog posts and podcasts).
I think it was an underestimation of timelines and a related underfunding of efforts. I started working on a non-technical AI safety book last June and I also think I underestimated timelines. I hope to have the book out in October because it is urgent but it sure would have been better if I started earlier.
Given the stakes of all the things we say we care about, I think a large donor should have spent up to $1M several years ago to support the writing of 5-10 accessible books on AI safety, Bio risk, etc (at least 2 per topic), so there would be something on hand that could a) be discussed, b) be distributed, and c) be drawn from for other comms materials.
This is probably still a good idea.
This is exactly what Catastrophic Risks of AI is doing. I think this is already a very good and accessible resource for a wide audience of reasonably intelligent people (maybe people who wouldn’t get it also won’t read any books whatsoever).
People in MIRI/CFAR/LessWrong ~actively resisted the idea of a marketing push optimized more along dimensions of mass persuadability, for better or worse. One reason is that there is inertia once you’ve built a mass movement with MoPs who can’t dialogue like on this site. My straw model is they think “we just need to produce technical insights and communicate them” and other comms work is an opportunity cost or creates/incentivizes some kind of epistemic trap.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
For me, the most important thing to do is band together and contribute—any small thing we can share—comments, share more content, explain more than before and yeah, If you have the courage—try a crack at the alignment problem. We can question things, how it went this way or why we are all here with this problem now—but it does not in add anything IMHO. I’d rather we all try our best to contribute instead—however little it is.
I think it adds something. It’s a bit strongly worded, but another way to see this is “could we have done any better, and if so, why?” Asking how we could have done better in the past lets us see ways to do better in the future.
Fair, I agree on this note.
“I held onto the finished paper for months and waited for GPT-4′s release before releasing it to have good timing).”
So, like, you knew about GPT-4 and how people would be shocked by it, but you’re an awesome PR dude. OK, Marsha.