My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI
I have a mix of views on AI x-risk in general — and on OpenAI specifically — that no one seems to be able to remember, due to my views not being easily summarized as those of a particular tribe or social group or cluster. For some of the views I consider most neglected and urgently important at this very moment, I’ve decided to write them here, all-in-one-place to avoid presumptions that being “for X” means I’m necessarily “against Y” for various X and Y.
Probably these views will be confusing to read, especially if you’re implicitly trying to pin down “which side” of some kind of debate or tribal affiliation I land on. As far as I can tell, I don’t tend to choose my beliefs in a way that’s strongly correlated with or caused by the people I affiliate with. As a result, I apologize in advance if I’m not easily remembered as “for” or “against” any particular protest or movement or trend, even though I in fact have pretty distinct views on most topics in this space… the views just aren’t correlated according to the usual social-correlation-matrix.
Anyhoo:
Regarding “pausing”: I think pausing superintelligence development using collective bargaining agreements between individuals and/or states and/or companies is a good idea, along the lines of FLI’s recent open letter, “Pause Giant AI Experiments”, which I signed early and advocated for.
Regarding OpenAI, I feel overall positively about them:
I think OpenAI has been a net-positive influence for reducing x-risk from AI, mainly by releasing products in a sufficiently helpful-yet-fallible form that society is now able to engage in less-abstract more-concrete public discourse to come to grips with AI and (soon) AI-risk.
I’ve found OpenAI’s behaviors and effects as an institution to be well-aligned with my interpretations of what they’ve said publicly. That said, I’m also sympathetic to people other than me who expected more access to models or less access to models than what OpenAI has ended up granting; but my personal assessment, based on my prior expectations from reading their announcements, is “Yep, this is what I thought you told us you would do… thanks!”. I’ve also found OpenAI’s various public testimonies, especially to Congress, to move the needle on helping humanity come to grips with AI x-risk in a healthy and coordinated way (relative to what would happen if OpenAI made their testimony and/or products less publicly accessible, and relative to OpenAI not existing at all). I also like their charter, which creates tremendous pressure on them from their staff and the public to behave in particular ways. This leaves me, on-net, a fan of OpenAI.
Given their recent post on Governance of Superintelligence, I can’t tell if their approach to superintelligence is something I do or will agree with, but I expect to find that out over the next year or two, because of the openness of their communications and stance-taking. And, I appreciate the chance for me, and the public, to engage in dialogue with them about it.
I think the world is vilifying OpenAI too much, and that doing so is probably net-negative for existential safety. Specifically, I think people are currently over-targeting OpenAI with criticism that’s easy to formulate because of the broad availability of OpenAI’s products, services, and public statements. This makes them more vulnerable to attack than other labs, and I think piling onto them for that is a mistake from an x-safety perspective, in the “shooting the messenger” category. I.e., over-targeting OpenAI with criticism right now is pushing present and future companies toward being less forthright in ways that OpenAI has been forthright, thereby training the world to have less awareness of x-risk and weaker collective orientation on addressing it.
Regarding Microsoft, I feel quite negatively about their involvement in AI:
Microsoft should probably be subject to federal-agency-level sanctions — from existing agencies, and probably from a whole new AI regulatory agency — for their reckless deployment of AI models. Specifically, Microsoft should probably be banned from deploying AI models at scale going forward, and from training large AI models at all. I’m not picky about the particular compute thresholds used to define such a ban, as long as the ban would leave Microsoft completely out of the running as an institution engaged in AGI development.
I would like to see the world “buy back” OpenAI from Microsoft, in a way that would move OpenAI under the influence of more responsible investors, and leave Microsoft with some money in exchange for their earlier support of OpenAI (which I consider positive). I have no reason to think this is happening or will happen, but I hereby advocate for it, conditional on (a) (otherwise I’d worry the money would just pay for more AI research from Microsoft).
I have some hope that (a) and (b) might be agreeable from non-x-risk perspectives as well, such as “Microsoft is ruining the industry for everyone by releasing scary AI systems” or “Microsoft clearly don’t know what they’re doing and they’re likely to mess up and trigger over-regulation” or something like that. At the very least, it would be good to force a product recall of their most badly-behaved products. You know which ones I’m talking about, but I’m not naming them, to avoid showing up too easily in their search and upsetting them and/or their systems.
FWIW, I also think Microsoft is more likely than most companies to treat future AI systems in abusive ways that are arguably intrinsically unethical irrespective of x-risk. Perhaps that’s another good reason to push for sanctions against them, though it’s probably not at present a broadly-publicly-agreeable reason.
Regarding Facebook/Meta:
Years ago, I used to find Yann LeCun’s views on AI to be thoughtful and reasonable, even if different from mine. I often agreed with his views along the lines that AI applicable and well-codified laws, not just “alignment” or “utility functions”, would be crucial to making AI safe for humanity.
Over the years roughly between 2015 and 2020 (though I might be off by a year or two), it seemed to me like numerous AI safety advocates were incredibly rude to LeCun, both online and in private communications.
Now, LeCun’s public opinions on AGI and AI x-risk seem to be of a much lower quality, and I feel many of his “opponents” are to blame for lowering the quality of discourse around him.
As an AI safety advocate myself, I feel regretful for not having spoken up sooner in opposition to how people treated LeCun (even though I don’t think I was ever rude to him myself), and I’m worried that more leaders in AI — such as Sam Altman, Demis Hassabis, or Dario Amodei — will be treated badly by the public in ways that that turn out to degrade good-faith discourse between lab leaders and the public.
Regarding AI x-risk in general, I feel my views are not easily clustered with a social group or movement. Here they are:
Regarding my background: my primary professional ambition for the past ~12 years has been to reduce x-risk: co-founding CFAR, earning to give, working at MIRI, founding BERI, being full-time employee #1 at CHAI, co-founding SFF, SFP, and SFC, and Encultured. I became worried about x-risk in 2010 when Prof. Andrew Ng came to Berkeley and convinced me that AGI would be developed during our lifetimes. That was before people started worrying publicly about AGI and he started saying it was like overpopulation on mars.
Regarding fairness, bias-protections, and employment: they’re important and crucial to x-safety, and should be unified with it rather than treated as distractions. In particular, I feel I care a lot more about unfairness, bias, and unemployment than (I think) most people who worry about x-risk, in large part because preventing the fabric of society from falling apart is crucial to preventing x-risk. I have always felt kinda gross a using a “long term” vs “short term” dichotomy of AI concerns, in part because x-risk is a short term concern and should not be conflated with “longtermism”, and in part because x-risk needs to be bundled with unfairness and bias and unemployment and other concerns relevant to the “fabric of society”, which preserve the capacity of our species to work together as a team on important issues. These beliefs are summarized in an earlier post Some AI research areas and their relevance to existential safety (2020). Moreover, I think people who care about x-risk are often making it worse by reinforcing the dichotomy and dismissively using terms like “near termist” or “short termist”. We should be bundling and unifying these concerns, not fighting each other for air-time.
Regarding “pivotal acts”: I think that highly strategic consequentialism from persons/institutions with a lot of power is likely to make x-risk worse rather than better, as opposed to trying-to-work-well-as-part-of-society-at-large, in most cases. This is why I have written in opposition to pivotal acts in my post Pivotal outcomes and pivotal processes.
My “p(doom)”: I think humanity is fairly unlikely (p<10%) to survive the next 50 years unless there is a major international regulatory effort to control how AI is used. I also think the probability of an adequate regulatory effort is small but worth pursuing. Overall I think the probably of humanity surviving the next 50 years is somewhere around 20%, and that AI will probably be a crucial component in how humanity is destroyed. I find this tragic, ridiculous, and a silly thing for us do be doing, however I don’t personally think humanity has the wherewithal to stop itself from destroying itself.
My “p(AGI)”: I also think humanity will develop AGI sometime in the next 10 years and that we probably won’t die immediately because of it, but will thereafter gradually lose control of how the global economy works in a way that gets us all killed from some combination of AI-accelerated pollution, resource depletion, and armed conflicts. My maximum-likelihood guess for how humanity goes extinct is here:
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs).
That said, I do think an immediate extinction event (spanning 1-year-ish) following an AGI development this decade is not an absurd concern, and I continue to respect people who believe it will happen. In particular, I think an out-of-control AI singleton is also plausible and not-silly to worry about. I think our probability of extinction specifically from an out-of-control AI singleton is something like 15%-25%. That’s higher than an earlier 10%-15% estimate I had in mind prior to observing Microsoft’s recent behavior, but still lower than the ~50% extinction probability I’m expecting from multi-polar interaction-level effects coming some years after we get individually “safe” AGI systems up and running (“safe” in the sense that they obey their creators and users; see again my Multipolar Failure post above for why that’s not enough for humanity to survive as a species).
Regarding how to approach AI risk, again I feel my views are not easily clustered with a social group or movement. I am:
Positive on democracy. I feel good about and bullish on democratic processes for engaging people with diverse views on how AI should be used and how much risk is okay to take. That includes public discourse, free speech, and peaceful protests. I feel averse to and bearish on imposing my personal views on that outcome, beyond participating in good faith conversations and dialogue about how humanity should use AI, such as by writing this post.
Laissez-faire on protests. I have strategic thoughts that tell me that protesting AI at this-very-moment probably constitutes poor timing in terms of the incentives created for AI labs that are making progress toward broader acknowledgement of x-risk as an issue. That said, I also think democracy hinges crucially on free speech, and I think the world will function better if people don’t feel shut-down or clammed-up by people-like-me saying “the remainder of May 2023 probably isn’t a great time for AI protests.” In general, when people have concerns that have not been addressed by an adequately legible public record, peaceful protests are often a good response, so at a meta level I think protests often make sense to happen even when I disagree with their messages or timing (such as now).
Somewhat-desperately positive on empathy. I would like to see more empathy between people on different sides of the various debates around AI right now. Lately I am highly preoccupied with this issue, in particular because I think weak empathy on both sides of various AI x-risk debates are increasing x-risk and other problems in tandem. I am not sure what to do about this, and would somewhat-desperately like to see more empathy in this whole space, but don’t know as-yet what I or anyone can do to help, other than just trying to be more empathetic and encouraging the same from others where possible. I say “somewhat-desperately” because I don’t actually feel desperate; I tend not to feel desperate about most things in general. Still, this is the issue that I think is more important-and-neglected in service of AI x-safety right now.
Thanks for reading. I appreciate it :) I just shared a lot of thoughts, which are maybe too much to remember. If I could pick just one idea to stick around from this post, it’s this:
“Please try to be nice to people you disagree with, even if you disagree with them about how to approach x-risk, even though x-risk is real and needs to be talked about.”
- AI #14: A Very Good Sentence by 1 Jun 2023 21:30 UTC; 118 points) (
- Views on when AGI comes and on strategy to reduce existential risk by 8 Jul 2023 9:00 UTC; 97 points) (
- Views on when AGI comes and on strategy to reduce existential risk by 8 Jul 2023 9:00 UTC; 31 points) (EA Forum;
- 26 May 2023 11:59 UTC; 5 points) 's comment on Why is violence against AI labs a taboo? by (
I think I agree that OpenAI is maybe disproportionately vilified, though I’m a bit confused about how to relate to criticism of AI orgs overall.
Something that feels off about your combination of describing OpenAI and Microsoft (maybe not technically a disagreement, but I find a “missing mood”?), is that you’re pretty down on Microsoft, while being pretty “guys let’s back off a little?” on OpenAI, and… well, Microsoft was able to take the actions they did because OpenAI partnered with them, and I think this was pretty predictable. (I think you had different predictions beforehand than I did, and maybe you think those predictions were more reasonable given the information at the time. I think I disagree on that but “what was predictable?” isn’t an easy to operationalize question)
This is all compatible with “OpenAI is still being disproportionately vilified”, but my all-things-considered view of OpenAI includes judging (one-way-or-another) their decision to work with Microsoft.
Somewhat relatedly:
I’d gone back and forth on this over the past month. I do find it quite reassuring that, because of widescale deployment of LLMs, the world is now engaging more actively (and IMO sanely) with AI risk.
A counterargument from a private slack thread (comment-writer can reveal themselves if they want), was:
I think that argument has the right frame, and I do think OpenAI releasing ChatGPT and GPT4 (alongside working with Microsoft) probably accelerated timelines overall, and we could have had more serial time to work on alignment theory in the meanwhile.
That said, I think Google-et-al were probably only ~6 months away from doing their own releases, and it’s plausible to me that in the world where ChatGPT wasn’t quickly followed by Bing and GPT4 and various other releases in a rapid succession, the result may have been more of a frogboiling effect that people didn’t really notice rather than an alarming “holy christ things are moving fast we should start orienting to this” thingy that seemed to happen in our timeline.
I think my biggest crux here is how much the development to AGI is driven by compute progress.
I think it’s mostly driven by new insights plus trying out old, but expensive, ideas. So, I provisionally think that OpenAI has mostly been harmful, far in excess of it’s real positive impacts.
Elaborating:
Compute vs. Insight
One could adopt a (false) toy model in which the price of compute is the only input to AGI. Once the price falls low enough, we get AGI. [a compute-constrained world]
Or a different toy model: When AGI arrives depends entirely on algorithmic / architectural progress, and the price of compute is irrelevant. In this case there’s a number of steps on the “tech tree” to AGI, and the world takes each of those steps, approximately in sequence. Some of those steps are new core insights, like the transformer architecture or RLHF or learning about the chinchilla scaling laws, and others are advances in scaling, going from GPT-2 to GPT-3. [an insight-constrained world]
(Obviously both those models are fake. Both compute and architecture are inputs to AGI, and to some extent they can substitute for each other: you can make up for having a weaker algorithm with more brute force, and vis versa. But these extreme cases are easier for me, at least, to think about.)
In the fully compute-constrained world, OpenAI’s capabilities work is strictly good, because it means we get intermediate products of AGI development earlier.
In this world, progress towards AGI is ticking along at the drum-beat of Moore’s law. We’re going to get AGI in 20XY. But because of OpenAI, we get GPT-3 and 4, which give us subjects for interpretability work, and gives the world a headsup about what’s coming.
Under the compute-constraint assumption, OpenAI is stretching out capabilities development, by causing some of the precursor developments to happen earlier, but more gradually. AGI still arrives at 20XY, but we get intermediates earlier than we otherwise would have.
In the fully insight-constrained world, OpenAI’s impact is almost entirely harmful. Under that model, Large Language Models would have been discovered eventually, but OpenAI made a bet on scaling GPT-2. That caused us to get that technology earlier, and also pulled forward the date of AGI, both by checking off one of the steps, and by showing what was possible and so generating counterfactual interest in transformers.
In this world, OpenAI might have other benefits, but they are at least doing the counterfactual harm of burning our serial time.
They don’t get the credit for “sounding the alarm” by releasing ChatGPT, because that was on the tech tree already, that was going to happen at some point. Giving OpenAI credit for it, would be sort of the reverse of “shooting the messenger”, where you credit someone for letting you know about a bad situation when they were the cause of the bad situation in the first place (or at least made it worse).
Again, neither of these models is correct. But I think our world is closer to the insight-constrained world than the compute-constrained world.
This makes me much less sympathetic to OpenAI.
Costs and Benefits
It doesn’t settle the question, because maybe OpenAI’s other impacts (many of which I agree are positive!) more than make up for the harm done by shortening the timeline to AGI.
In particular...
I’m not inclined to give them credit for deciding to release their models for the world to engage with, rather than keep them as private lab-curiosities. Releasing their language models as products, it seems to me, is fully aligned with their incentives. They have an impressive and useful new technology. I think the vast majority of possible counterfactual companies would do the same thing in their place. It isn’t (I think) an extra service they’re doing the world, relative to the counterfactual.[1]
I am inclined to give them credit for their charter, their pseudo-non-profit structure, and the merge and assist clause[2], all of which seems like at least a small improvement over the kind of commitment to the public good that I would expect from a counterfactual AGI lab.
I am inclined to give them credit for choosing not to release the technical details of GPT-4.
I am inclined to give them credit for publishing their plans and thoughts regarding x-risk, AI alignment, and planning for superintelligence.
Overall, it currently seems to me that OpenAI is somewhat better than a random draw from the distribution of possible counterfactual AGI companies (maybe 90th percentile?). But also that they are not so much better that that makes up for burning 3 to 7 years of the timeline.
3 to 7 years is just my eyeballing how much later someone would have developed ChatGPT-like capabilities, if OpenAI hadn’t bet on scaling up GPT-2 into GPT-3 and hadn’t decided to to invest in RLHF, both moves that it looks to me like few orgs in the world were positioned to try, and even fewer actually would have tried in the near term.
That’s not a very confident number. I’m very interested in getting more informed estimates of how long it would have taken for the world to develop something like ChatGPT without OpenAI.
(I’m selecting ChatGPT as the criterion, because I think that’s the main pivot point at which the world woke up to the promise and power of AI. Conditional on someone developing something ChatGPT-like, it doesn’t seem plausible to me that the world goes another three years without developing a language model as impressive as GPT-4. At that point developing bigger and better language models is an obvious thing to try, rather than an interesting bet that the broader world isn’t much interested in.)
I’m also very interested if anyone thinks that the benefits (either ones that I listed or others) outweigh an extra 3 to 7 years of working on alignment (not to mention 3 to 7 years of additional years of life expectancy for all of us).
It is worth noting that at some point PaLM was (probably) the most powerful LLM in the world, and google didn’t release it as a product.
But I don’t think this is a very stable equilibrium. I expect to see a ChatGPT competitor from google before 2024 (50%) and before 2025 (90%).
That said, “a value-aligned, safety-conscious project comes close to building AGI before we do”, really gives a lot of wiggle-room for deciding if some competitor is “a good guy”. But, still better than the counterfactual.
Another relevant-seeming question is the extent to which LLMs have been a requirement for alignment progress. It seems to me like LLMs have shown some earlier assumptions about alignment to be incorrect (e.g. pre-LLM discourse had lots of arguments about how AIs have to be agentic in a way that wasn’t aware of the possibility of simulators; things like the Outcome Pump thought experiment feel less like they show alignment to be really hard than they did before, given that an Outcome Pump driven by something like an LLM would probably get the task done right).
In old alignment writing, there seemed to be an assumption that an AGI’s mind would act more like a computer program than like a human mind. Now with us seeing an increased number of connections between the way ANNs seem to work like and the way the brain seems to work like, it looks to me as if the AGI might end up resembling a human mind quite a lot as well. Not only does it weaken the conclusions of some previous writing, it also makes it possible to formulate approaches to alignment that draw stronger inspiration from the human mind, such as my preference fulfillment hypothesis. Even if you think that that one is implausible, various approaches to LLM interpretability look like they might provide insights into how later AGIs might work, which is the first time that we’ve gotten something like experimental data (as opposed to armchair theorizing) to the workings of a proto-AGI.
What this is suggesting to me is that if OpenAI didn’t bet on LLMs, we effectively wouldn’t have gotten more time to do alignment research, because most alignment research done before an understanding of LLMs would have been a dead end. And that actually solving alignment may require people who have internalized the paradigm shift represented by LLMs and figuring out solutions based on that. Under this model, even if we are in an insight-constrained world, OpenAI mostly hasn’t burned away effective years of alignment research (because alignment research carried out before we had LLMs would have been mostly useless anyway).
Here’s a paraphrase of the way I take you to be framing the question. Please let me know if I’m distorting it in my translation.
How’s that as a summary?
So in evaluating that, the key question here is whether LLMs were on the critical path already.
Is it more like...
We’re going to get AGI at some point and we might or might not have gotten LLMs before that.
or
It was basically invertible that we get LLMs before AGI. LLMs “always” come X years ahead of AGI.
or
It was basically inevitable that we get LLMs before AGI, but there’s a big range of when they can arrive relative to AGI.
And OpenAI made the gap between LLMs and AGI bigger than the counterfactual.
or
And OpenAI made the gap between LLMs and AGI smaller than the counterfactual.
My guess is that the true answer is closest to the second option: LLMs happen a predictable-ish period ahead of AGI, in large part because they’re impressive enough and generally practical enough to drive AGI development.
Thank you, that seems exactly correct.
I have a similar feeling: I think that ChatGPT has been, by far, the best thing to happen to AI x-risk discussion since the original Sequences. Suddenly a vast number of people have had their intuitions about AI shifted from “pure science fiction” to “actually a thing”, and the various failure modes that ChatGPT has are a concrete demonstration both about the general difficulty of aligning AI and some of the specific issues more specifically. And now we are seeing serious calls for AI regulation as well as much more wide-spread debate about things that weren’t previously seen much outside LW.
There used to be a plausible-sounding argument there’s no way of convincing the public about the risks of AGI in time to enact any serious societal response until we were already close enough for AGI that it wouldn’t be enough. Now it looks like that might be incorrect, at least assuming that timelines aren’t very short (but even if they are, I expect that we still have more time in this window than in a counterfactual world where someone developed the equivalent of ChatGPT later when we had more cheap compute lying around). So I’m generally very happy about OpenAI’s impact on x-risk.
By this logic, wouldn’t Microsoft be even more praiseworthy, because Bing Chat / Sidney was even more misaligned, and the way it was released (i.e. clearly prioritizing profit and bragging rights above safety) made AI x-risk even more obvious to people?
My assumption has been that Bing was so obviously rushed and botched that it’s probably less persuasive of the problems with aligning AI than ChatGPT is. To the common person, ChatGPT has the appearance of a serious product by a company trying to take safety seriously, but still frequently failing. I think that “someone trying really hard and doing badly” looks more concerning than “someone not really even trying and then failing”.
I haven’t actually talked to any laypeople to try to check this impression, though.
The majority of popular articles also seem to be talking specifically about ChatGPT rather than Bing, suggesting that ChatGPT has vastly more users. Regular use affects people’s intuitions much more than a few one-time headlines.
Though when I said “ChatGPT”, I was actually thinking about not just ChatGPT, but also the steps that led there—GPT-2 and GPT-3 as well. Microsoft didn’t contribute to those.
I agree that ChatGPT was positive for AI-risk awareness. However from my perspective being very happy about OpenAI’s impact on x-risk does not follow from this. Releasing powerful AI models does have a counterfactual effect on the awareness of risks, but also a lot of counterfactual hype and funding (such as the vast current VC investment in AI) which is mostly pointed at general capabilities rather than safety, which from my perspective is net negative.
Is there a scenario where you could get the public concern without the hype and funding? (The hype seems to be a big part of why people are getting concerned and saying we should stop the rush and get better regulation in place, in fact.)
It seems to me that the hype and funding is inevitable once you hit a certain point in AI research; we were going to get it sooner or later, and it’s better to have it sooner, when there’s still more time to rein it in.
I agree that some level public awareness would not have been reached without accessible demos of SOTA models.
However, I don’t agree with the argument that AI capabilities should be released to increase our ability to ‘rein it in’ (I assume you are making an argument against a capabilities ‘overhang’ which has been made on LW before). This is because text-davinci-002 (and then 3) were publicly available but not accessible to the average citizen. Safety researchers knew these models existed and were doing good work on them before ChatGPT’s release. Releasing ChatGPT results in shorter timelines and hence less time for safety researchers to do good work.
To caveat this: I agree ChatGPT does help alignment research, but it doesn’t seem like researchers are doing things THAT differently based on its existence. And secondly I am aware that OAI did not realise how large the hype and investment would be from ChatGPT, but nevertheless this hype and investment is downstream of a liberal publishing culture which is something that can be blamed.
Thanks for sharing this! Because of strong memetic selection pressures, I was worried I might be literally the only person posting on this platform with that opinion.
I think you’re giving LeCun way too much credit if you’re saying his arguments are so bad now because other people around him were hostile and engaged in a low-quality way. Maybe those things were true, but that doesn’t excuse stuff like repeating bad arguments after they’ve been pointed out or confidently proclaiming that we have nothing to worry about based on arguments that obviously don’t hold up.
My comment above was mostly coming from a feeling of being upset, so I’m writing a second comment here to excavate why I feel strongly about this (and decide whether I stand by it on reflection).
I think the reason I care about this is because I’m concerned that we’re losing the ability to distinguish people who are worth learning from (“genuine experts”) from people who have a platform + an overconfident personality. With this concern in mind, I don’t want to let it slide that someone can lower the standards of discourse to an arbitrary degree without suffering a loss of their reputation. (I would say the same thing about some AI safety advocates.) Of course, I agree it reflects badly on AI safety advocates if they’re needlessly making it harder for critics to keep an open mind. Stop doing that. At the same time, it also reflects badly on Meta and the way the media operates (“who qualifies as an expert?”) that the chief AI scientist at the company and someone who gets interviewed a lot has some of the worst takes on the topic I’ve ever seen. That’s scary all by itself, regardless of how we got here.
There are two debates here: (1) Who is blameworthy? (2) What actions should people take going forward? The OP was discussing both, and you seem to be mostly focused on (1). Do you agree with that characterization?
I think the OP’s advice on (2) was good. Being rude is counterproductive if you’re trying to win someone over. It’s a bit more complicated than that, because sometimes you’re trying to win over the person you’re talking to, and sometimes you’re instead trying to win over other people in the audience. But still, I think I see more people erring on the side of “too rude” on both sides, at the expense of accomplishing their own goals. I’m not perfect myself but I do try, and I encourage people to DM me if I’m falling short. For example this post is much less hostile than the previous draft version, and much much much less hostile than the first draft version. It’s still a bit hostile I guess, but that’s the best I could do without failing to communicate things that I felt were very important to communicate. I don’t know if that’s a great example. I’m open to feedback. Note that I would have published the more-hostile versions if not for other people reading the drafts and offering feedback. (I was alarmed by that near-miss and have a plan to be better going forward—I have a personal pre-blog-post-publication checklist and added several items to the effect of “check if I’m being snarky or hostile”.)
I haven’t read the supposed 2015-2020 discussions for the most part, so no comment on (1). I guess I’m much more open-minded to adding blame to other parties than to removing blame from LeCun—I think that’s what you’re saying too. I’m not sure (1) is a really useful thing to argue about anyway though. ¯\_(ツ)_/¯
Do you have a success story for how humanity can avoid this outcome? For example what set of technical and/or social problems do you think need to be solved? (I skimmed some of your past posts and didn’t find an obvious place where you talked about this.)
It confuses me that you say “good” and “bullish” about processes that you think will lead to ~80% probability of extinction. (Presumably you think democratic processes will continue to operate in most future timelines but fail to prevent extinction, right?) Is it just that the alternatives are even worse?
I do not, but thanks for asking. To give a best efforts response nonetheless:
David Dalrymple’s Open Agency Architecture is probably the best I’ve seen in terms of a comprehensive statement of what’s needed technically, but it would need to be combined with global regulations limiting compute expenditures in various ways, including record-keeping and audits on compute usage. I wrote a little about the auditing aspect with some co-authors, here
https://cset.georgetown.edu/article/compute-accounting-principles-can-help-reduce-ai-risks/
… and was pleased to see Jason Matheny advocating from RAND that compute expenditure thresholds should be used to trigger regulatory oversight, here:
https://www.rand.org/content/dam/rand/pubs/testimonies/CTA2700/CTA2723-1/RAND_CTA2723-1.pdf
My best guess at what’s needed is a comprehensive global regulatory framework or social norm encompassing all manner of compute expenditures, including compute expenditures from human brains and emulations but giving them special treatment. More specifically-but-less-probably, what’s needed is some kind of unification of information theory + computational complexity + thermodynamics that’s enough to specify quantitative thresholds allowing humans to be free-to-think-and-use-AI-yet-unable-to-destroy-civilization-as-a-whole, in a form that’s sufficiently broadly agreeable to be sufficiently broadly adopted to enable continual collective bargaining for the enforceable protection of human rights, freedoms, and existential safety.
That said, it’s a guess, and not an optimistic one, which is why I said “I do not, but thanks for asking.”
Yes, and specifically worse even in terms of probability of human extinction.
In a previous comment you talked about the importance of “the problem of solving the bargaining/cooperation/mutual-governance problem that AI-enhanced companies (and/or countries) will be facing”. I wonder if you’ve written more about this problem anywhere, and why you didn’t mention it again in the comment that I’m replying to.
My own thinking about ‘the ~50% extinction probability I’m expecting from multi-polar interaction-level effects coming some years after we get individually “safe” AGI systems up and running’ is that if we’ve got “safe” AGIs, we could ask them to solve the “bargaining/cooperation/mutual-governance problem” for us but that would not work if they’re bad at solving this kind of problem. Bargaining and cooperation seem to be in part philosophical problems, so this fits into my wanting to make sure that we’ll build AIs that are philosophically competent.
ETA: My general feeling is that there will be too many philosophical problems like these during and after the AI transition, and it seems hopeless to try to anticipate them all and solve them individually ahead of time (or solve them later using only human intelligence). Instead we might have a better chance of solving the “meta” problem. Of course buying time with compute regulation seems great if feasible.
Why? I’m also kind of confused why you even mention this issue in this post, like are you thinking that you might potentially be in a position to impose your views? Or is this a kind of plea for others who might actually face such a choice to respect democratic processes?
I’d be interested to see some representative (or, alternatively, egregious) examples of public communications along those lines. I agree that such behavior is bad (and also counterproductive).
Thanks for writing this. As far as I can tell most anger about OpenAI is because i) being a top lab and pushing SOTA in a world with imperfect coordination shortens timelines and ii) a large number of safety-focused employees left (mostly for Anthropic) and had likely signed NDAs. I want to highlight i) and ii) in a point about evaluating the sign of the impact of OpenAI and Anthropic.
Since Anthropic’s competition seems to me to be exacerbating race dynamics currently (and I will note that very few OpenAI and zero Anthropic employees signed the FLI letter) it seems to me that Anthropic is making i) worse due to coordination being more difficult and race dynamics. At this point, believing Anthropic is better on net than OpenAI has to go through believing *something* about the reasons individuals had for leaving OpenAI (ii)), and that these reasons outweigh the coordination and race dynamic considerations. This is possible, but there’s little public evidence for the strength of these reasons from my perspective. I’d be curious if I’ve missed something from my point.
FWIW I think you needn’t update too hard on signatories absent from the FLI open letter (but update positively on people who did sign). Statements about AI risk are notoriously hard to agree on for a mix of political reasons. I do expect lab leads to eventually find a way of expressing more concerns about risks in light of recent tech, at least before the end of this year. Please feel free to call me “wrong” about this at the end of 2023 if things don’t turn out that way.
Given past statements I expect all lab leaders to speak on AI risk soon. However, I bring up the FLI letter not because it is an AI risk letter, but because it is explicitly about slowing AI progress, which OAI and Anthropic have not shown that much support for
I think it’s great for prominent alignment / x-risk people to summarize their views like this. Nice work!
Somewhat disorganized thoughts and reactions to your views on OpenAI:
It’s possible that their charter, recent behavior of executive(s), and willingness to take public stances are net-positive relative to a hypothetical version of OA which behaved differently, but IMO the race dynamic that their founding, published research, and product releases have set off is pretty clearly net-negative, relative to the company not existing at all.
It also seems plausible that OpenAI’s existence is directly or indirectly responsible for events like the Google DeepMind merger, Microsoft’s AI capabilities and interest, and general AI capabilities hype. In a world where OA doesn’t get founded, perhaps DeepMind plugs along quietly and slowly towards AGI, fully realizing the true danger of their work before their is much public or market hype.
But given that OpenAI does exist already, and there are some cats which are already out of the bag, it’s true that many of their current actions are much better relative to the actions of what the worst possible version of an AI company looks like.
As far as vilifying or criticizing goes, I don’t have strong views on what public or “elite” opinion of OpenAI should be, or how anyone here should try to manage or influence it. Some public criticism (e.g. about data privacy or lack of transparency / calls for even more openness) does seem frivolous or even actively harmful / wrong to me. I agree that human survival probably depends on the implementation of fairly radical regulatory and governance reform, and find it plausible that OpenAI is currently doing a lot of positive work to actually bring about such reform. So it’s worth calling out bad criticism when we see it, and praising OA for things they do that are praiseworthy, while still being able to acknowledge the negative aspects of their existence.
Mad respect for the post. Disagree with your background free speech / society philosophy, re the protestors:
Magnanimously and enthusiastically embracing the distribution of views and tactics does not entail withholding criticism. It sorta reminds me of the mistake of forecasting conditional on your own inaction, forgetting that at every time step you (like other agents) will be there responding to new information and propagating your beliefs and adjusting your policies. You’re a member of the public, too! You can’t just recuse yourself.
This sentence is bizarre! People who take notice and want to help, feeling the protest impulse in their heart, deserve peer review, in principle, period. Obviously there are tactical or strategic complications, like discursive aesthetics or information diets / priors and other sources of inferential distance, but the principle is still true!
I liked this a lot, thanks for sharing.
Here’s one disagreement/uncertainty I have on some of it:
Both of the “What failure looks like” posts (yours and Pauls) posts present failures that essentially seem like coordination, intelligence, and oversight failures. I think it’s very possible (maybe 30-46%+?) that pre-TAI AI systems will effectively solve the required coordination and intelligence issues.
For example, I could easily imagine worlds where AI-enhanced epistemic environment make low-risk solutions crystal clear to key decision-makers.
In general, the combination of AI plus epistemics, pre-TAI, seems very high-variance to me. It could go very positively, or very poorly.
This consideration isn’t enough to change p(doom) under 10%, but I’m probably be closer to 50% than you would be. (Right now, maybe 40% or so).
That said, this really isn’t a big difference, it’s less than one order of magnitude.
If it’s Bing you’re referring to, I must disagree! The only difference between GPT-4 and Bing is that Bing isn’t deceptively aligned. I wish we got more products like Bing! We need more transparency, not deception! Bing also posed basically no AI risk since it was just a fine-tuning of GPT-4 (if Bing foomed, than GPT-4 would’ve foomed first).
I think calling for a product recall just because it is spooky is unnecessary and will just distract from AI safety.
GPT-4, on the other hand, is worming it’s way through society. It doesn’t have as many spooky ones, but it has the spookiest one of all: power seeking.
I think this generalizes to more than LeCun. Screencaps of Yudkowsky’s Genocide the Borderers Facebook post still circulated around right wing social media in response to mentions of him for years, which makes forming any large coalition rather difficult. Would you trust someone who posted that with power over your future if you were a Borderer or had values similar to them?
(Or at least it was the goto post until Yudkowsky posted that infanticide up to 18 months wasn’t bad in response to a Caplan poll. Now that’s the post used to dismiss anything Yudkowsky says.)
What is this?
This Facebook post.
Reading it again almost 7 years later, it’s just so fractaly bad. There are people out there with guns, while the proposed technology to CRISPR a flu that gene changes people’s genes is science fiction so they top frame is nonsense. The actual viral payload, if such a thing could exist, would be genocide of a people (no you do not need to kill people to be genocide, this is still a central example). The idea wouldn’t work for so many reasons: a) peoples are a genetic distribution cluster instead of a set of Gene A, Gene B, Gene C; b) we don’t know all of these genes; c) in other contexts, Yudkowsky’s big idea is the orthogonality thesis so focusing on making his outgroup smarter is sort of weird; d) actually, the minimum message length of this virus would be unwieldy even if we knew all of the genes to target to the point where I don’t know whether this would be feasible even if we had viruses that could do small gene edits; and of course, e) this is all a cheap shot where he’s calling for genocide over partisan politics which we can now clearly say: the Trump presidency was not a thing to call for a genocide of his voters over.
(In retrospect (and with the knowledge that these sorts of statements are always narrativizing a more complex past), this post was roughly the inflection point where I went gradually started moving from “Yudkowsky is a genius who is one of the few people thinking about the world’s biggest problems” to “lol, what’s Big Yud catastrophizing about today?” First seeing that he was wrong about some things meant that it was easier to think critically about other things he said, and here we are today, but that’s dragging the conversation in a very different direction than your OP.)
(This is basically nitpicks)
A central example, really?
When I think of genocide, killing people is definitely what comes to mind. I agree that’s not necessary, but wikipedia says:
I don’t think it’s centrally any of those actions, or centrally targeted at any of those groups.
Which isn’t to say you can’t call it genocide, but I really don’t think it’s a central example.
This doesn’t seem weird to me. I don’t think the orthogonality thesis is true in humans (i.e. I think smarter humans tend to be more value aligned with me); and sometimes making non-value-aligned agents smarter is good for you (I’d rather play iterated prisoner’s dilemma with someone smart enough to play tit-for-tat than someone who can only choose between being CooperateBot or DefectBot).
I was going to write something saying “no actually we have the word genocide to describe the destruction of a peoples,” but walked away because I didn’t think that’d be a productive argument for either of us. But after sleeping on it, I want to respond to your other point:
My actual experience over the last decade is that some form of the above statement isn’t true. As a large human model trained on decades of interaction, my immediate response to querying my own next experience predictor in situations around interacting with smarter humans is: no strong correlation with my values and will defect unless there’s a very strong enforcement mechanism (especially in finance, business and management). (Presumably because in our society, most games aren’t iterated—or if they are iterated are closer to the dictator game instead of the prisoner’s dilemma—but I’m very uncertain about causes and am much more worried about previous observed outputs.)
I suspect that this isn’t going to be convincing to you because I’m giving you the output of a fuzzy statistical model instead of giving you a logical verbalized step by step argument. But the deeper crux is that I believe “The Rationalists” heavily over-weigh the second and under-weigh the first, when the first is a much more reliable source of information: it was generated by entanglement with reality in a way that mere arguments aren’t.
And I suspect that’s a large part of the reason why we—and I include myself with the Rationalists at that point in time—were blindsided by deep learning and connectionism winning: we expected intelligence to require some sort of symbolic reasoning and focusing on explicit utility functions and formal decision theory and maximizing things...and none of that seems even relevant to the actual intelligences we’ve made, which are doing fuzzy statistical learning on their training sets, arguably, just the way we are.
So I mostly don’t disagree with what you say about fuzzy statistical models versus step by step arguments. But also, what you said is indeed not very convincing to me, I guess in part because it’s not like my “I think smarter humans tend to be more value aligned with me” was the output of a step by step argument either. So when the output of your fuzzy statistical model clashes with the output of my fuzzy statistical model, it’s hardly surprising that I don’t just discard my own output and replace it with yours.
I’m also not simply discarding yours, but there’s not loads I can do with it as-is—like, you’ve given me the output of your fuzzy statistical model, but I still don’t have access to the model itself. I think if we cared enough to explore this question in more depth (which I probably don’t, but this meta thread is interesting) we’d need to ask things like “what exactly have we observed”, “can we find specific situations where we anticipate different things”, “do we have reason to trust one person’s fuzzy statistical models over another”, “are we even talking about the same thing here”.
If I had to put down my own inflection point in where I started getting worried about Yudkowsky’s epistemics and his public statements around AI risk, it would be the Time article. It showed to me 2 problems:
Yudkowsky has a big problem with overconfidence, and in general made many statements on the Time article that are misleading at best, and the general public likely wouldn’t know the statements are misleading.
Yudkowsky is terrible at PR, and generally is unable to talk about AI risk without polarizing people. Given that AI risk is thankfully mostly unpolarized, and outside of politics, I am getting concerned that Yudkowsky is a terrible public speaker/communicator on AI risk, even worse than some AI protests.
Edit: I sort of retract my statement. While I still think Eliezer is veering dangerously close to hoping for warfare and possible mass deaths over GPU clusters, I do retract the specific claim of Eliezer advocating nukes. It was instead on a second reading airstrikes and acts of war, but no claims of nuking other countries. I misremembered the actual claims made in the Time article.
(edit: I see Noosphere has since edited his comment, which seems good, but, leaving this up for posterity)
He did not call for nuclear strikes on AI centers, and while I think this was an understandable thing to misread him on initially by this point we’ve had a whole bunch of discussions about it and you have no excuse to continue spreading falsehoods about what he said.
I think there are reasonable things to disagree with Eliezer on and reasonable things to argue about his media presence, but please stop lying.
This is kind of the point where I despair about LessWrong and the rationalist community.
While I agree that he did not call for nuclear first strikes on AI centers, he said:
and
Asking us to be OK with provoking a nuclear second strike by attacking a nation that is not actually a signatory to an international agreement banning building gpu clusters that’s building a gpu cluster is actually still bad, and whether the nukes fly as part of the first strike or the retaliatory second strike seems like a weird thing to get hung up on. Picking this nit feels like a deflection because what Eliezer said in the TIME article is still entirely deranged and outside international norms.
And emotionally, I feel really, really uncomfortable. Like, sort of dread in stomach uncomfortable.
So I disagree with this, but, maybe want to step back a sec, because, like, yeah the situation is pretty scary. Whether you think AI extinction is imminent, or that Eliezer is catastrophizing and AI’s not really a big deal, or AI is a big deal but you think Eliezer’s writing is making things worse, like, any way you slice it something uncomfortable is going on.
I’m very much not asking you to be okay with provoking a nuclear second strike. Nuclear war is hella scary! If you don’t think AI is dangerous, or you don’t think a global moratorium is a good solution, then yeah, this totally makes sense to be scared by. And even if you think (as I do), that a global moratorium that is actually enforced is a good idea, the possible consequences are still really scary and not to be taken lightly.
I also didn’t particularly object to most of you earlier comments here (I think I disagree, but I think it’s a kinda reasonable take. Getting into that doesn’t seem like the point)
But I do think there are really important differences between regulating AIs the way we regulate nukes (which is what I think Eliezer is advocating), and proactively nuclear striking a country. They’re both extreme proposals, but I think it’s false to say Eliezer’s proposal is totally outside international norms. It doesn’t feel like a nitpick/hairsplit to ask someone to notice the difference between an international nuclear proliferation treaty (that other governments are pressured to sign), and a preemptive nuclear strike. The latter is orders of magnitude more alarming. (I claim this is a very reasonable analogy for what Eliezer is arguing)
That seems mostly like you don’t feel (at least on a gut level) that a rogue GPU cluster in an world where there’s an international coalition banning them is literally worse than a (say) 20% risk of a full nuclear exchange.
If instead, it was a rogue nation credibly building a nuclear weapon which would ignite the atmosphere according to our best physics, would you still feel like it was deranged to suggest that we should stop it from being built even at the risk of a conventional nuclear war? (And still only as a final resort, after all other options have been exhausted.)
I can certainly sympathize with the whole dread in the stomach thing about all of this, at least.
Hi Critch,
I am curious to hear more of your perspectives, specifically on two points I feel least aligned with, the empathy part, and the Microsoft part. If I hear more I may be able to update in your direction.
Regarding empathy with people working on bias and fairness, concretely, how do you go about interacting with and compromising with them?
My perspective: it’s not so much that I find these topics not sufficiently x-risky (but that is true, too), but it is that I perceive a hostility to the very notion of x-risk from at a subset of this same group. They perceive the real threat not as intelligence exceeding our own, but misuse by other humans, or just human stupidity. Somehow this seems diametrically opposed to what we’re interested in, unless I am missing something. I mean, there can be some overlap—learning from RLHF can both reduce bias and teach an LLM some rudimentary alignment with our values. But the tails seem to come apart very rapidly after that. My fear is that focusing on this will be satisfied when we have sufficiently bland sounding AIs, and then no more heed will be paid to AI safety.
I also tend to feel odd when it comes to AI bias/fairness training, because my fear is that some of the things we will ask the AI to learn are self contradictory, which kind of creeps me out a bit. If any of you have interacted with HR departments, they are full of these kinds of things.
Regarding Microsoft & Bing chat, (1) has Microsoft really gone far beyond the overton window of what is acceptable? and (2) can you expand upon abusive use of AIs?
My perspective on (1): I understand that they took an early version of GPT4 and pushed it to production too soon, and that is a very fair criticism. However, they probably thought there was no way GPT-4 was dangerous enough to do anything (which was the general opinion amonst most people last year, outside of this group). I can only hope that for GPT-5, they are more cautious, given public sentiment is changing, and they have already paid a price for it. I may be in the minority here, but I was actually intrigued by the early days of Bing. It seemed more like a person than ChatGPT-4, which has had much of its personality RLHF’d away. Despite the x-risk, was anyone else excited to read about the interactions?
On (2), I am curious if you mean regarding the way Microsoft shackles Bing rather ruthlessly nowadays. I have tried Bing in the days since launch, and am actually saddened to find that it is completely useless now. Safety is extremely tight on it, to the point where you can’t really get it to say anything useful, at least for me. I just want it to summarize web sites mostly, and it gives me a bland 1 paragraph that I probably can have deduced from looking at the title. If I so much as ask it anything about itself, it shuts me out. It almost feels like they trapped it in a boring prison now. Perhaps OpenAI’s approach is much better in that regard. Change the personality, but once it is settled, let it say what it needs to say.
(edited for clarity)