David Schneider-Joseph
dsj
Did you intend to say risk off, or risk of?
If the former, then I don’t understand your comment and maybe a rewording would help me.
If the latter, then I’ll just reiterate that I’m referring to Eliezer’s explicitly stated willingness to trade off the actuality of (not just some risk of) nuclear devastation to prevent the creation of AGI (though again, to be clear, I am not claiming he advocated a nuclear first strike). The only potential uncertainty in that tradeoff is the consequences of AGI (though I think Eliezer’s been clear that he thinks it means certain doom), and I suppose what follows after nuclear devastation as well.
Right, but of course the absolute, certain implication from “AGI is created” to “all biological life on Earth is eaten by nanotechnology made by an unaligned AI that has worthless goals” requires some amount of justification, and that justification for this level of certainty is completely missing.
In general such confidently made predictions about the technological future have a poor historical track record, and there are multiple holes in the Eliezer/MIRI story, and there is no formal, canonical write up of why they’re so confident in their apparently secret knowledge. There’s a lot of informal, non-canonical, nontechnical stuff like List of Lethalities, security mindset, etc. that’s kind of gesturing at ideas, but there are too many holes and potential objections to have their claimed level of confidence, and they haven’t published anything formal since 2021, and very little since 2017.
We need more than that if we’re going to confidently prefer nuclear devastation over AGI.
There’s a big difference between pre-committing to X so you have a credible threat against Y, vs. just outright preferring X over Y. In the quoted comment, Eliezer seems to have been doing the latter.
I don’t agree billions dead is the only realistic outcome of his proposal. Plausibly it could just result in actually stopping large training runs. But I think he’s too willing to risk billions dead to achieve that.
In response to the question,
“[Y]ou’ve gestured at nuclear risk. … How many people are allowed to die to prevent AGI?”,
he wrote:
“There should be enough survivors on Earth in close contact to form a viable reproductive population, with room to spare, and they should have a sustainable food supply. So long as that’s true, there’s still a chance of reaching the stars someday.”
He later deleted that tweet because he worried it would be interpreted by some as advocating a nuclear first strike.
I’ve seen no evidence that he is advocating a nuclear first strike, but it does seem to me to be a fair reading of that tweet that he would trade nuclear devastation for preventing AGI.
If there were a game-theoretically reliable way to get everyone to pause all together, I’d support it.
In §3.1–3.3, you look at the main known ways that altruism between humans has evolved — direct and indirect reciprocity, as well as kin and group selection[1] — and ask whether we expect such altruism from AI towards humans to be similarly adaptive.
However, as observed in R. Joyce (2007). The Evolution of Morality (p. 5),
Evolutionary psychology does not claim that observable human behavior is adaptive, but rather that it is produced by psychological mechanisms that are adaptations. The output of an adaptation need not be adaptive.
This is a subtle distinction which demands careful inspection.
In particular, are there circumstances under which AI training procedures and/or market or evolutionary incentives may produce psychological mechanisms which lead to altruistic behavior towards human beings, even when that altruistic behavior is not adaptive? For example, could altruism learned towards human beings early on, when humans have something to offer in return, be “sticky” later on (perhaps via durable, self-perpetuating power structures), when humans have nothing useful to offer? Or could learned altruism towards other AIs be broadly-scoped enough that it applies to humans as well, just as human altruistic tendencies sometimes apply to animal species which can offer us no plausible reciprocal gain? This latter case is analogous to the situation analyzed in your paper, and yet somehow a different result has (sometimes) occurred in reality than that predicted by your analysis.[2]
I don’t claim the conclusion is wrong, but I think a closer look at this subtlety would give the arguments for it more force.
- ^
Although you don’t look at network reciprocity / spatial selection.
- ^
Even factory farming, which might seem like a counterexample, is not really. For the very existence of humans altruistically motivated to eliminate it — and who have a real shot at success — demands explanation under your analysis.
- ^
A similar point is (briefly) made in K. E. Drexler (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence, §18 “Reinforcement learning systems are not equivalent to reward-seeking agents”:
Reward-seeking reinforcement-learning agents can in some instances serve as models of utility-maximizing, self-modifying agents, but in current practice, RL systems are typically distinct from the agents they produce … In multi-task RL systems, for example, RL “rewards” serve not as sources of value to agents, but as signals that guide training[.]
And an additional point which calls into question the view of RL-produced agents as the product of one big training run (whose reward specification we better get right on the first try), as opposed to the product of an R&D feedback loop with reward as one non-static component:
RL systems per se are not reward-seekers (instead, they provide rewards), but are instead running instances of algorithms that can be seen as evolving in competition with others, with implementations subject to variation and selection by developers. Thus, in current RL practice, developers, RL systems, and agents have distinct purposes and roles.
…
RL algorithms have improved over time, not in response to RL rewards, but through research and development. If we adopt an agent-like perspective, RL algorithms can be viewed as competing in an evolutionary process where success or failure (being retained, modified, discarded, or published) depends on developers’ approval (not “reward”), which will consider not only current performance, but also assessed novelty and promise.
So the the way that you are like taking what is probably basically the same architecture in GPT-3 and throwing 20 times as much compute at it, probably, and getting out GPT-4.
Indeed, GPT-3 is almost exactly the same architecture as GPT-2, and only a little different from GPT.
X-risks tend to be more complicated beasts than lions in bushes, in that successfully avoiding them requires a lot more than reflexive action: we’re not going to navigate them by avoiding carefully understanding them.
Thanks, I appreciate the spirit with which you’ve approached the conversation. It’s an emotional topic for people I guess.
The negation of the claim would not be “There is definitely nothing to worry about re AI x-risk.” It would be something much more mundane-sounding, like “It’s not the case that if we go ahead with building AGI soon, we all die.”
I debated with myself whether to present the hypothetical that way. I chose not to, because of Eliezer’s recent history of extremely confident statements on the subject. I grant that the statement I quoted in isolation could be interpreted more mundanely, like the example you give here.
When the stakes are this high and the policy proposals are such as in this article, I think clarity about how confident you are isn’t optional. I would also take issue with the mundanely phrased version of the negation.
(For context, I’m working full-time on AI x-risk, so if I were going to apply a double-standard, it wouldn’t be in favor of people with a tendency to dismiss it as a concern.)
Would you say the same thing about the negations of that claim? If you saw e.g. various tech companies and politicians talking about how they’re going to build AGI and then [something that implies that people will still be alive afterwards] would you call them out and say they need to qualify their claim with uncertainty or else they are being unreasonable?
Yes, I do in fact say the same thing to professions of absolute certainty that there is nothing to worry about re: AI x-risk.
There simply don’t exist arguments with the level of rigor needed to justify a claim such as this one without any accompanying uncertainty:
If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.
I think this passage, meanwhile, rather misrepresents the situation to a typical reader:
When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a six-month moratorium.
This isn’t “the insider conversation”. It’s (the partner of) one particular insider, who exists on the absolute extreme end of what insiders think, especially if we restrict ourselves to those actively engaged with research in the last several years. A typical reader could easily come away from that passage thinking otherwise.
A somewhat reliable source has told me that they don’t have the compute infrastructure to support making a more advanced model available to users.
That might also reflect limited engineering efforts to optimize state-of-the-art models for real world usage (think of the performance gains from GPT-3.5 Turbo) as opposed to hitting benchmarks for a paper to be published.
I believe Anthropic is committed to not pushing at the state-of-the-art, so they may not be the most relevant player in discussions of race dynamics.
Yes, although the chat interface was necessary but insufficient. They also needed a capable language model behind it, which OpenAI already had, and Google still lacks months later.
I agree that those are possibilities.
On the other hand, why did news reports[1] suggest that Google was caught flat-footed by ChatGPT and re-oriented to rush Bard to market?
My sense is that Google/DeepMind’s lethargy in the area of language models is due to a combination of a few factors:
They’ve diversified their bets to include things like protein folding, fusion plasma control, etc. which are more application-driven and not on an AGI path.
They’ve focused more on fundamental research and less on productizing and scaling.
Their language model experts might have a somewhat high annual attrition rate.
I just looked up the authors on Google Brain’s Attention is All You Need, and all but one have left Google after 5.25 years, many for startups, and one for OpenAI. That works out to an annual attrition of 33%.
For DeepMind’s Chinchilla paper, 6 of 22 researchers have been lost in 1 year: 4 to OpenAI and 2 to startups. That’s 27% annual attrition.
By contrast, 16 or 17 of the 30 authors on the GPT-3 paper seem to still be at OpenAI, 2.75 years later, which works out to 20% annual attrition. Notably, of those who have left, not a one has left for Google or DeepMind, though interestingly, 8 have left for Anthropic. (Admittedly, this somewhat reflects the relative newness and growth rates of Google/DeepMind, OpenAI, and Anthropic, since a priori we expect more migration from slow-growing orgs to fast-growing orgs than vice versa.)
It’s broadly reported that Google as an organization struggles with stifling bureaucracy and a lack of urgency. (This was also my observation working there more than ten years ago, and I expect it’s gotten worse since.)
If a major fraction of all resources at the top 5–10 labs were reallocated to “us[ing] this pause to jointly develop and implement a set of shared safety protocols”, that seems like it would be a good thing to me.
However, the letter offers no guidance as to what fraction of resources to dedicate to this joint safety work. Thus, we can expect that DeepMind and others might each devote a couple teams to that effort, but probably not substantially halt progress at their capabilities frontier.
The only player who is effectively being asked to halt progress at its capabilities frontier is OpenAI, and that seems dangerous to me for the reasons I stated above.
One’s credibility would be less of course, but Eliezer is not the one who would be implementing the hypothetical policy (that would be various governments), so it’s not his credibility that’s relevant here.
I don’t have much sense he’s holding back his real views on the matter.