New article in Time Ideas by Eliezer Yudkowsky.
Here’s some selected quotes.
In reference to the letter that just came out (discussion here):
We are not going to bridge that gap in six months.
It took more than 60 years between when the notion of Artificial Intelligence was first proposed and studied, and for us to reach today’s capabilities. Solving safety of superhuman intelligence—not perfect safety, safety in the sense of “not killing literally everyone”—could very reasonably take at least half that long. And the thing about trying this with superhuman intelligence is that if you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone.
…
Some of my friends have recently reported to me that when people outside the AI industry hear about extinction risk from Artificial General Intelligence for the first time, their reaction is “maybe we should not build AGI, then.”
Hearing this gave me a tiny flash of hope, because it’s a simpler, more sensible, and frankly saner reaction than I’ve been hearing over the last 20 years of trying to get anyone in the industry to take things seriously. Anyone talking that sanely deserves to hear how bad the situation actually is, and not be told that a six-month moratorium is going to fix it.
Here’s what would actually need to be done:
The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.
Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for anyone, including governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying “maybe we should not” deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.
Shut it all down.
We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.
Shut it down.
In the past few weeks I’ve noticed a significant change in the Overton window of what seems possible to talk about. I think the broad strokes of this article seem basically right, and I agree with most of the details.
I don’t expect this to immediately cause AI labs or world governments to join hands and execute a sensibly-executed-moratorium. But I’m hopeful about it paving the way for the next steps towards it. I like that this article, while making an extremely huge ask of the world, spells out exactly how huge an ask is actually needed.
Many people on hackernews seemed suspicious of the FLI Open Letter because it looks superficially like the losers in a race trying to gain a local political advantage. I like that Eliezer’s piece makes it more clear that it’s not about that.
I do still plan to sign the FLI Open Letter. If a better open letter comes along, making an ask that is more complete and concrete, I’d sign that as well. I think it’s okay to sign open letters that aren’t exactly the thing you want to help build momentum and common knowledge of what people think. (I think not-signing-the-letter while arguing for what better letter should be written, similar to what Eliezer did here, also seems like a fine strategy for common knowledge building)
I’d be most interested in an open letter for something like a conditional-commitment (i.e. kickstarter mechanic) for shutting down AI programs IFF some critical mass of other countries and companies shut down AI programs, which states something like:
It’d be good if all major governments and AI labs agreed to pause capabilities research indefinitely while we make progress on existential safety issues.
Doing this successfully is a complex operation, and requires solving novel technological and political challenges. We agree it’d be very hard, but nonetheless is one of the most important things for humanity to collectively try to do. Business-as-usual politics will not be sufficient.
This is not claiming it’d necessarily be good for any one lab to pause unilaterally, but we all agree that if there was a major worldwide plan to pause AI development, we would support that plan.
If safe AGI could be developed, it’d be extremely valuable for humanity. We’re not trying to stop progress, we’re just trying to make sure we actually achieve progress, rather than causing catastrophe.
I think that’s something that several leading AI lab leaders seem like they should basically support (given their other stated views)
A 30 year delay is actually needed? Since it’s impossible, doesn’t this collapse to the “we’re doomed regardless” case? Which devolves to “might as well play with AGI while we can...”
A concise and impactful description of the difficulty we face.
I expect that the message in this article will not truly land with a wider audience (it still doesn’t seem to land with all of the LW audience...), but I’m glad to see someone trying.
I would be interested in hearing the initial reactions and questions of readers who were previously unfamiliar with AI x-risk have after reading this article. I’ll keep an eye on Twitter, I suppose.
I just want to say that this is very clear argumentation and great rhetoric. Eliezer’s writing at its best.
And it does seem to have got a bit of traction. A very non-technical friend just sent me the link, on the basis that she knows “I’ve always been a bit worried about that sort of thing.”
I disagree with AI doomers, not in the sense that I consider it a non-issue, but that my assessment of the risk of ruin is something like 1%, not 10%, let alone the 50%+ that Yudkowsky et al. believe. Moreover, restrictive AI regimes threaten to produce a lot of outcomes things, possibly including the devolution of AI control into a cult (we have a close analogue in post-1950s public opinion towards civilian applications of nuclear power and explosions, which robbed us of Orion Drives amongst other things), what may well be a delay in life extension timelines by years if not decades that results in 100Ms-1Bs of avoidable deaths (this is not just my supposition, but that of Aubrey de Grey as well, who has recently commented on Twitter that AI is already bringing LEV timelines forwards), and even outright technological stagnation (nobody has yet canceled secular dysgenic trends in genomic IQ). I leave unmentioned the extreme geopolitical risks from “GPU imperialism”.
While I am quite irrelevant, this is not a marginal viewpoint—it’s probably pretty mainstream within e/acc, for instance—and one that has to be countered if Yudkowsky’s extreme and far-reaching proposals are to have any chance of reaching public and international acceptance. The “bribe” I require is several OOMs more money invested into radical life extension research (personally I have no more wish to die of a heart attack than to get turned into paperclips) and into the genomics of IQ and other non-AI ways of augmenting collective global IQ such as neural augmentation and animal uplift (to prevent long-term idiocracy scenarios). I will be willing to support restrictive AI regimes under these conditions if against my better judgment, but if there are no such concessions, it will have to just be open and overt opposition.
Couple of points:
If we screw this up, there are over eight billion people on the planet, and countless future humans who might either then die or never get a chance to be born. Even if you literally don’t care about future people, the lives of everybody currently on the planet is a serious consideration and should guide the calculus. Just because those dying now are more salient to us does not mean that we’re doing the right thing by shoving these systems out the door.
If embryo selection just doesn’t happen, or gets outlawed when someone does launch the service, assortative mating will probably continue to guarantee that there are as many if not more people available to research AI in the future. The right tail of the bell curve is fattening over time, not thinning. Unless you expect some sort of complete political collapse within the next 30 years because the general public lost an average of 2 IQ points, dysgenics isn’t a serious issue.
My guess is that within the next 30 years embryo selection for intelligence will be available in certain countries, which will completely dominate any default 1 IQ point per generation loss that’s happening now. The tech is here, it’s legal, and you can do it if you’re knowledgable enough today. We are already in a “hardware overhang” with regard to genetic enhancement and are just waiting for someone to launch the service for normies.
“e/acc” is a grifter twitter club. Like most twitter clubs, its purpose is to inflate the follower counts of core users, and in this case help certain people in tech justify what they were going to do anyways. They are not even mainstream among AI researchers, certainly not AI researchers at top labs working on AGI.
It’s ultimately a question of probabilities, isn’t it? If the risk is ~1%, we mostly all agree Yudkowsky’s proposals are deranged. If 50%+, we all become Butlerian Jihadists.
My point is I and people like me need to be convinced it’s closer to 50% than to 1%, or failing that we at least need to be “bribed” in a really big way.
I’m somewhat more pessimistic than you on civilizational prospects without AI. As you point out, bioethicists and various ideologues have some chance of tabooing technological eugenics. (I don’t understand your point about assortative mating; yes, there’s more of it, but does it now cancel out regression to the mean?). Meanwhile, in a post-Malthusian economy such as ours, selection for natalism will be ultra-competitive. The combination of these factors would logically result in centuries of technological stagnation and a population explosion that brings the world population back up to the limits of the industrial world economy, until Malthusian constraints reassert themselves in what will probably be quite a grisly way (pandemics, dearth, etc.), until Clarkian selection for thrift and intelligence reasserts itself. It will also, needless to say, be a few centuries in which other forms of existential risks will remain at play.
PS. Somewhat of an aside but don’t think it’s a great idea to throw terms like “grifter” around, especially when the most globally famous EA representative is a crypto crook (who literally stole some of my money, small % of my portfolio, but nonetheless, no e/acc person has stolen anything from me).
Uhh… No, we don’t? 1% of 8 billion people is 80 million people, and AI risk involves more at stake if you loop in the whole “no more new children” thing. I’m not saying that “it’s a small chance of a very bad thing happening so we should work on it anyways” is a good argument, but if we’re taking as a premise is that the chance of failure is 1%, that’d be sufficient to justify several decades of safety research. At least IMO.
https://en.wikipedia.org/wiki/Coming_Apart_(book)
AI research is pushed mostly by people at the tails of intellgence, not by lots of small contributions from people with average intelligence. It’s true that currently smarter people have slightly fewer children, but now more than ever smarter people are having children with each other, and so the amount of very smart people is probably increasing over time, at least by Charles Murray’s analysis. Whatever happens now, it’s very unlikely we will lose the human capital necessary to develop AGI, and we certainly wouldn’t lose it in less than thirty years. Regression to the mean is a thing but doesn’t prevent this trend.
Who said anything about several centuries? I’m one of the most radical people on this forum and I probably wouldn’t want to commit to more than thirty years, not specifically because of dysgenic considerations, but just to prevent something weird from happening in the meantime. I’m sure there are people here who disagree with me though.
For what it’s worth I think virtually every “alignment person” right now would be in favor of giving you the life extension research funding that you want, and was already in favor of it. I don’t think we’ll be in a position to trade, but if we could, I struggle to think of anybody who would disagree in practice.
Fair, I guess.
Note that your “30 years” hypothetical has immense cost for those who have a very high discount rate.
Say your discount rate is high. This means that essentially you place little value on the lives of people who will be alive after you anticipate being dead, and high value on stopping the constant deaths of people you know now.
Also if you have a more informed view of the difficulty of all medical advances, you might conclude that life extension is not happening without advanced AGI to push it. That it becomes essentially infeasible to expect human clinicians to life extend people, it’s too complex a treatment, has too many subtle places where a mistake will be fatal, too many edge cases where you would need to understand medicine better than any living human to know what to do to save the patient.
If you believe in (high discount rate, life extension requires ASI) you would view a 30 year ban as mass manslaughter, maybe mass murder. As many counts of it as the number of aging deaths worldwide that happen over 30 years, it’s somewhere between 1.9 billion and 3.8 billion people.
Not saying you should believe this, but you should as a rationalist be willing to listen to arguments for each point above.
I am definitely willing to listen to such arguments, but ATM I don’t actually believe in “discount rates” on people, so ¯\(ツ)/¯
The discount rate is essentially how much you value a future person’s life over current lives.
I realize, and my “discount rate” under that framework is zero.
Nobody’s discount rate can be literally zero, because that leads to absurdities if actually acted upon.
Like what?
Variants of Pascal’s mugging.
Infinite regress.
etc.
Even with zero discount rate the problem simplifies to your model of how much knowledge would a “30 year pause” world gain when it cannot build large AGI to determine how they work and their actual failure modes. If you believe from history of human engineering that the gain would be almost nothing, then that ends up being a bad bet because it has a large cost (all the deaths) and no real gain.
It seems that you see what can be gained in a pause is only technical alignment advances. But I want to point out that safety comes from solving two problems, the governance problem and the technical problem. And we need a lot of time to get the governance ironed out. The way I see it, misaligned AGI or ASI is the most dangerous thing ever, so we need the best regulation ever. The best safety / testing requirements. The best monitoring by governments of AI groups for unsafe actions, the best awareness among politicians. Among the public. And if one country has great governance figured out, it takes years or decades to get that level of excellence to be applied globally.
Do you know of examples of this? I don’t know cases of good government or good engineering or good anything without feedback, where the feedback proves the government or engineering is bad.
That’s the history of human innovation. I suspect that no pause would gain anything but more years alive for currently living humans by the length of the pause.
I do not have good examples no. You are right that normally there is learning from failure cases. But we should still try. Now we have nothing that is required that could prevent an AGI breakout. Nick Bostrom has wrote in Superintelligence for example that we could implement tripwires and honeypot situations in virtual worlds that would trigger a shutdown. We can think of things that are better than nothing.
I don’t think we should try. I think the potential benefits of tinkering with AGI are worth some risks, and if EY is right and it’s always uncontrollable and will turn against us then we are all dead one way or another anyways. If he’s wrong we’re throwing away the life of every living human being for no reason.
And there is reason to think EY is wrong. CAIS and careful control of what gets rewarded in training could lead to safe enough AGI.
That is a very binary assessment. You make it seem like either Safety is impossible or it is easy. If impossible, we could save everyone by not building AGI. If we know it to be easy, I agree, we should accelerate. But the reality is that we do not know, and that it can be somewhere on the spectrum from easy to impossible. And since everything is on the line, including your life. Better safe than sorry is to me the obvious approach. Do I see correctly that you think the pausing AGI situation is not ‘safe’ because if all would go well, the AGI could be used to make humans immortal?
One hidden bias here is that I think a large hidden component on safety is a constant factor.
So pSafe has two major components (natural law, human efforts).
“Natural law” is equivalent to the question of “will a fission bomb ignite the atmosphere”. In this context it would be “will a smart enough superintelligence be able to trivially overcome governing factors?”
Governing factors include: a lack of compute (by inventing efficient algorithms and switching to those), lack of money (by somehow manipulating the economy to give itself large amounts of money), lack of robotics (some shortcut to nanotechnology), lack of data (better analysis of existing data or see robotics) and so on. To the point of essentially “magic”, see the sci Fi story metamorphosis of prime intellect.
In worlds where intelligence scales high enough, the machine basically always breaks out and does what it will. Humans are too stupid to ever have a chance. Not just as individuals but organizationally stupid. Slowing things down does not do anything but delay the inevitable. (And if fission devices ignited the atmosphere, same idea. Almost all world lines end in extinction)
This is why EY is so despondent: if intelligence is this powerful there probably exists no solution.
In worlds where aligning AI is easy because they need rather expensive and obviously easy to control amounts of compute to be interesting in capabilities, and the machines are not particularly hard to corral into doing what we want, then alignment efforts don’t matter.
I don’t know how much probability mass lies in the “in between” region. Right now, I believe the actual evidence is heavily in favor of “trivial alignment”.
“Trivial alignment” is “stateless microservices with an in distribution detector before the AGI”. This is an architecture production software engineers are well aware of.
Nevertheless, “slow down” is almost always counterproductive. In world lines where AGI can be used to our favor or is also hostile, this is a weapon we have to have on our side or we will be defeated. Pauses disempower us. In world lines where alignment is easy, pauses kill everyone who isn’t life extended with better medicine. In world lines where alignment can’t be done by human beings, it doesn’t matter.
The world lines of “AI is extremely dangerous” and “humans can contain it if they collaborate smartly and internationally and very carefully inch forward in capabilities and they SUCCEED” may not exist. This is I think the crux of it. The probability of this combination of events may be so low no worldline within the permutation space of the universe contains this particular combination of events.
Notice it’s a series probability: demon like AGI that can escape anything but we can be very careful not to give them too much capabilities and “international agreement”.
Thank you for your comments and explanations! Very interesting to see your reasoning. I have not seen evidence of trivial alignment. I hope for the mass to be in the in between region. I want to point out that I think you do not need your “magic” level intelligence to do a world takeover. Just high human level with digital speed and working with your copies is likely enough I think. My blurry picture is that the AGI would only need a few robots in a secret company and some paid humans to work on a >90% mortality virus where the humans are not aware what the robots are doing. And hope for international agreement comes not so much from a pause but from a safe virtual testing environment that I am thinking about.
We are not in an overhang for serious IQ selection based on my understanding of what people doing research in the field are saying.
Define “serious”. You can get lifeview to give you embryo raw data and then run published DL models on those embryos and eek out a couple iq points that way. That’s a serious enough improvement over the norm that it would counterbalance the trend akarlin speaks of by several times. Perhaps no one will ever industrialize that service or improve current models, but then that’s another argument.
The marginal personal gain of 2 points comes with a risk of damage from mistakes by the gene editing tool used. Mistakes that can lead to lifetime disability, early cancer etc.
You probably would need a “guaranteed top 1 percent” outcome for both IQ and longevity and height and beauty and so on to be worth the risk, or far more reliable tools.
There’s no gene editing involved. The technique I just described works solely on selection. You create 10 embryos, use DL to identify the one that looks smartest, implant that one. That’s the service lifeview provides, only for health instead of psychometrics. I think it’s only marginally cost effective because of the procedures necessary, but the baby is fine.
Ok that works and yes already exists as a service or will. Issue is that it’s not very powerful. Certainly doesn’t make humans competitive in an AI future, most parents even with 10 rolls of the dice won’t have the gene pool for a top 1 percent human in any dimension.
I think you are misunderstanding me. I’m not suggesting that any amount of genetic enhancement is going to make us competitive with a misaligned superintelligence. I’m responding to the concern akarlin raised about pausing AI development by pointing out that if this tech is industrialized it will outweigh any natural problems caused by smart people having less children today. That’s all I’m saying.
Sure. I concede if by some incredible global coordination humans managed to all agree and actually enforce a ban on AGI development, then in far future worlds they could probably still do it.
What will probably ACTUALLY happen is humans will build AGI. It will behave badly. Then humans will build restricted AGI that is not able to behave badly. This is trivial and there are many descriptions on here on how a restricted AGI would be built.
The danger of course is deception. If the unrestricted AGI acts nice until it’s too late then thats a loss scenario.
IQ is highly heritable. If I understand this presentation by Steven Hsu correctly [https://www.cog-genomics.org/static/pdf/ggoogle.pdf slide 20] he suggests that mean child IQ relative to population mean is approximately 60% of distance from population mean to parental average IQ. Eg Dad at +1 S.D. Mom at +3 S.D gives children averaging about 0.6*(1+3)/2 = +1.2 S.D. This basic eugenics give a very easy/cheap route to lifting average IQ of children born by about 1 S.D by using +4 S.D sperm donors. There is no other tech (yet) that can produce such gains as old fashioned selective breeding.
It also explains why rich dynasties can maintain average IQ about +1SD above population in their children—by always being able to marry highly intelligent mates (attracted to the money/power/prestige)
Or, it might be that high IQ parents raise their children in a way that’s different from low IQ and it has nothing to do with genetics at all?
Heritability is measured in a way that rules that out. See e.g. Judith Harris or Bryan Caplan for popular expositions about the relevant methodologies & fine print.
I totally get where you’re coming from, and if I thought the chance of doom was 1% I’d say “full speed ahead!”
As it is, at fifty-three years old, I’m one of the corpses I’m prepared to throw on the pile to stop AI.
Hell yes. That’s been needed rather urgently for a while now.
“if I thought the chance of doom was 1% I’d say “full speed ahead!”
This is not a reasonable view. Not on Longtermism, nor on mainstream common sense ethics. This is the view of someone willing to take unacceptable risks for the whole of humanity.
Why not ask him for his reasoning, then evaluate it? If a person thinks there’s 10% x-risk over the next 100 years if we don’t develop superhuman AGI, and only a 1% x-risk if we do, then he’d suggest that anybody in favour of pausing AI progress was taking “unacceptable risks for the whole of himanity”.
The reasoning was given in the comment prior to it, that we want fast progress in order to get to immortality sooner.
A 1% probability of “ruin” i.e. total extinction (which you cite is your assessment) would still be more than enough to warrant complete pausing for a lengthy period of time.
There seems to be a basic misunderstanding of expected utility calculations here where people are equating the weighting on an outcome with a simple probability x cost of outcome e.g. if there is a 1% chance of the 8 billion dying the “cost” of that is not 80 million lives (as someone further down this thread computes).
Normally the way you’d think about this (if you want to do math to stuff like this) is to think about what you’d pay to avoid that outcome using Expected Utility.
This weights over the entire probability distribution with their expected (marginal utility). In this case, marginal utility goes to infinity if we go extinct (unless you are in the camp: let the robots take over!) and hence even small risks of it would warrant us doing everything possible to avoid it.
This is essentially precautionary principle territory.
Far more than a “lengthy ban” — it justifies an indefinite ban until such time as the probability can be understood, and approaches zero.
Hello Rufus! Welcome to Less Wrong!
Don’t forget to you are considering precluding medicine that could save or extend all the lives. Theoretically every living human. The “gain” is solely in the loss of future generations unborn who might exist in worlds with safe AGI.
And that’s worth a lot. I am a living human being, evolved to desire the life and flourishing of living human beings. Ensuring a future for humanity is far more important than whether any number of individuals alive today die. I am far more concerned with extending the timeline of humanity than maximizing any short term parameters.
Over what time window does your assessed risk apply. eg 100years, 1000? Does the danger increase or decrease with time?
I have deep concern that most people have a mindset warped by human pro-social instincts/biases. Evolution has long rewarded humans for altruism, trust and cooperation, women in particular have evolutionary pressures to be open and welcoming to strangers to aid in surviving conflict and other social mishaps, men somewhat the opposite [See eg “Our Kind” a mass market anthropological survey of human culture and psychology] . Which of course colors how we view things deeply.
But to my view evolution strongly favours Vernor Vinge’s “Aggressively hegemonizing” AI swarms [“A fire upon the deep”]. If AIs have agency, freedom to pick their own goals, and ability to self replicate or grow, then those that choose rapid expansion as a side-effect of any pretext ‘win’ in evolutionary terms. This seems basically inevitable to me over long term. Perhaps we can get some insurance by learning to live in space. But at a basic level it seems to me that there is a very high probability that AI wipes out humans over the longer term based on this very simple evolutionary argument, even if initial alignment is good.
Except the point of Yudkowsky’s “friendly AI” is that they don’t have freedom to pick their own goals, they have the goals we set to them, and they are (supposedly) safe in a sense that “wiping out humanity” is not something we want, therefore it’s not something an aligned AI would want. We don’t replicate evolution with AIs, we replicate careful design and engineering that humans have used for literally everything else. If there is only a handful of powerful AIs with careful restrictions on what their goals can be (something we don’t know how to do yet), then your scenario won’t happen
My thoughts run along similar lines. Unless we can guarantee the capabilities of AI will be drastically and permanently curtailed, not just in quantity but also in kind (no ability to interact with the internet or the physical world, no ability to develop intent)c then the inevitability of something going wrong implies that we must all be Butlerian Jihadists if we care for biological life to continue.
But biological life is doomed to cease rapidly anyways. Replacement with new creatures and humans is still mass extinction of the present. The fact you have been socially conditioned to ignore this doesn’t change reality.
The futures where :
(Every living human and animal today is dead, new animals and humans replace)
And (Every living human and animal today is dead, new artificial beings replace)
Are the same future for anyone alive now. Arguably the artificial one is the better future because no new beings will necessarily die until the heat death. AI systems all start immortal as an inherent property.
It’s arguable from a negative utilitarian maladaptive point of view, sure. I find the argument wholly unconvincing.
How we get to our deaths matters, whether we have the ability to live our lives in a way we find fulfilling matters, and the continuation of our species matters. All are threatened by AGI.
I simply presumed that Eliezer was being sarcastic to get clicks, or offering an early April Fool’s joke to us.
There’s nothing “Terminator” about ChatGPT or GPT-4, it’s just a heavily trained, limited-focus, algorithm that “sounds” like a human. No consciousness, will, intent, being—just lots of code-lines executing on demand. The only thing about it that is interesting is that it demonstrates the limitations of the Turing Test.
It seems that if you hear something that sounds like a human you think they are one—the Turing Test turns out to be a test of the listener not the speaker.
And for more clarity:
A). Try a little focus—climate change will solve the problem for us long before GPT-257 takes over the planet.
B). And even if you disagree, there is zero chance of Eliezer’s hope for a universal agreement on anything by members of our species (where have you been for the last 500 millions years).
Not many people consider GPT-4 extremely dangerous on its own. Hooking up something at that level of intelligence into a larger system with memory storage and other modules is a bit more threatening, and probably sufficient to do great harm already if wielded by malevolent actors unleashing it onto social media platforms, for example.
The real danger is that GPT-4 is a mile marker we’ve blown by on the road to ever more capable AI. At some point, likely before climate change becomes an existential threat, we lose control and that’s when things get really weird, unpredictable, and dangerous.
Eliezer has near-zero hope for humanity’s survival. I think we’d all agree that the universal agreement he suggests is not something plausible in the current world. He’s not advocating for it because he believes it might happen but rather it’s the only thing he thinks might be enough to give us a shot at survival.
I think there’s an important meta-level point to notice about this article.
This is the discussion that the AI research and AI alignment communities have been having for years. Some agree, some disagree, but the ‘agree’ camp is not exactly small. Until this week, all of this was unknown to most of the general public, and unknown to anyone who could plausibly claim to be a world leader.
When I say it was unknown, I don’t mean that they disagreed. To disagree with something, at the very least you have to know that there is something out there to disagree with. In fact they had no idea this debate existed. Because it’s very hard to notice the implications of upcoming technologiy when you’re a 65 year old politician in DC rather than a 25 year old software engineer in SF. But also because many people and many orgs made the explicit decision to not do public outreach, to not try to make the situation legible to laypeople, to not look like people playing with the stakes we have in fact been playing with.
I do not think lies were told, exactly, but I think the world was deceived. I think the phrasing of the FLI open letter was phrased so as to continue that deception, and that the phrasing was the output of a political calculation.
By contrast, I think Eliezer’s Time article was honest. It tells people that part that matters to them: that their life is on the line. That the current situation is shocking. That this isn’t a hypothetical anymore.
Yes, lots of people will disagree with what it says, in various places. People who think their alignment technique will work. People who think AI is further in the future. And a dozen dumber disagreements that I will not mention. But people can’t evaluate those disagreements without having the base model, can’t evaluate the sides of a debate they don’t know is happening.
I don’t think this is known to be true.
That seems too strong. Some data points:
1. There’s been lots of AI risk press over the last decade. (E.g., Musk and Bostrom in 2014, Gates in 2015, Kissinger in 2018.)
2. Obama had a conversation with WIRED regarding Bostrom’s Superintelligence in 2016, and his administration cited papers by MIRI and FHI in a report on AI the same year. Quoting that report:
3. Hillary Clinton wrote in her memoir:
4. A 2017 JASON report called “Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DoD” said,
(This is some evidence that there was awareness of the debate, albeit also relatively direct evidence that important coalitions dismissed AGI risk at the time.)
5. Elon Musk warned a meeting of US governors in 2017 about AI x-risk.
6. This stuff shows up in lots of other places, like the World Economic Forum’s 2017 Global Risks Report: “given the possibility of an AGI working out how to improve itself into a superintelligence, it may be prudent – or even morally obligatory – to consider potentially feasible scenarios, and how serious or even existential threats may be avoided”.
7. Matt Yglesias talks a decent amount about AI risk and the rationalists, and is widely followed by people in and around the Biden administration. Ditto Ezra Klein. Among 150 Biden transition officials who had Twitter accounts in 2021, apparently “The most commonly followed political writers and reporters are [Nate] Silver (44.4% follow him), [Ezra] Klein (39.6%), Maggie Haberman (36.8%), Matthew Yglesias (25.7%), and David Frum (25%).”
8. Dominic Cummings has been super plugged into LW ideas like AGI risk for many years, and a 2021 Boris Johnson speech discusses existential risk and quotes Toby Ord.
9. A 2021 United Nations report mentions “existential risk” and “long-termism” by name, and recommends the “regulation of artificial intelligence to ensure that this is aligned with shared global values”.
10. Our community has had important representatives in the Biden administration: Jason Matheny (previously at FHI) left his role running IARPA to spend a year in various senior White House roles, before leaving to run RAND.
Note that in a fair number of cases I think I know the specific individuals who helped bring about (e.g., Stuart Russell), so I more treat this as an update about the ability of people in our network to make stuff like this happen, rather than treating it as an update that there’s necessarily a big pool of people driving this issue who aren’t on our radar.
That seems true to me. There’s definitely been a conscious effort over the years by many EAs and rationalists (including MIRI) to try to not make this a front-and-center political issue.
(Though the “political calculation” FHI made might not be about that, or might not be about that directly; it might be about avoiding alienating ML researchers, and/or other factors.)
I very much agree with this.
I don’t think that the lack of wide public outreach before was a cold calculation. Such outreach would simply not go through. It wouldn’t be published in Time, NYT, or aired on broadcast TV channels. The Overton window has started to open only after ChatGPT and especially after GPT-4.
I also don’t agree that the FLI letter is a continuation of some deceptive plan. It’s toned down deliberately for the purpose of marshalling many diverse signatories who would otherwise probably not sign, such as Bengio, Yang, Mostaque, DeepMind folks, etc. So it’s not deception, it’s an attempt to find the common ground.
There simply don’t exist arguments with the level of rigor needed to justify a claim such as this one without any accompanying uncertainty:
I think this passage, meanwhile, rather misrepresents the situation to a typical reader:
This isn’t “the insider conversation”. It’s (the partner of) one particular insider, who exists on the absolute extreme end of what insiders think, especially if we restrict ourselves to those actively engaged with research in the last several years. A typical reader could easily come away from that passage thinking otherwise.
Would you say the same thing about the negations of that claim? If you saw e.g. various tech companies and politicians talking about how they’re going to build AGI and then [something that implies that people will still be alive afterwards] would you call them out and say they need to qualify their claim with uncertainty or else they are being unreasonable?
Re: the insider conversation: Yeah, I guess it depends on what you mean by ‘the insider conversation’ and whether you think the impression random members of the public will get from these passages brings them closer or farther away from understanding what’s happening. My guess is that it brings them closer to understanding what’s happening; people just do not realize how seriously experts take the possibility that literally AGI will literally happen and literally kill literally everyone. It’s a serious possibility. I’d even dare to guess that the majority of people building AGI (weighted by how much they are contributing) think it’s a serious possibility, which maybe we can quantify as >5% or so, despite the massive psychological pressure of motivated cognition / self-serving rationalization to think otherwise. And the public does not realize this yet, I think.
Also, on a more personal level, I’ve felt exactly the same way about my own daughter for the past two years or so, ever since my timelines shortened.
Yes, I do in fact say the same thing to professions of absolute certainty that there is nothing to worry about re: AI x-risk.
The negation of the claim would not be “There is definitely nothing to worry about re AI x-risk.” It would be something much more mundane-sounding, like “It’s not the case that if we go ahead with building AGI soon, we all die.”
That said, yay—insofar as you aren’t just applying a double standard here, then I’ll agree with you. It would have been better if Yud added in some uncertainty disclaimers.
I debated with myself whether to present the hypothetical that way. I chose not to, because of Eliezer’s recent history of extremely confident statements on the subject. I grant that the statement I quoted in isolation could be interpreted more mundanely, like the example you give here.
When the stakes are this high and the policy proposals are such as in this article, I think clarity about how confident you are isn’t optional. I would also take issue with the mundanely phrased version of the negation.
(For context, I’m working full-time on AI x-risk, so if I were going to apply a double-standard, it wouldn’t be in favor of people with a tendency to dismiss it as a concern.)
Thank you for your service! You may be interested to know that I think Yudkowsky writing this article will probably have on balance more bad consequences than good; Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously. Alas. I don’t blame him too much for it because I sympathize with his frustration & there’s something to be said for the policy of “just tell it like it is, especially when people ask.” But yeah, I wish this hadn’t happened.
(Also, sorry for the downvotes, I at least have been upvoting you whilst agreement-downvoting)
“But yeah, I wish this hadn’t happened.”
Who else is gonna write the article? My sense is that no one (including me) is starkly stating publically the seriousness of the situation.
“Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously”
I’m worried about people making character attacks on Yudkowsky (or other alignment researchers) like this. I think the people who think they can probably solve alignment by just going full-speed ahead and winging it, they are arrogant. Yudkowsky’s arrogant-sounding comments about how we need to be very careful and slow, are negligible in comparison. I’m guessing you agree with this (not sure) and we should be able to criticise him for his communication style, but I am a little worried about people publically undermining Yudkowsky’s reputation in that context. This seems like not what we would do if we were trying to coordinate well.
I agree that there’s a need for this sort of thing to be said loudly. (I’ve been saying similar things publicly, in the sense of anyone-can-go-see-that-I-wrote-it-on-LW, but not in the sense of putting it into major news outlets that are likely to get lots of eyeballs)
I do agree with that. I think Yudkowsky, despite his flaws,* is a better human being than most people, and a much better rationalist/thinker. He is massively underrated. However, given that he is so disliked, it would be good if the Public Face of AI Safety was someone other than him, and I don’t see a problem with saying so.
(*I’m not counting ‘being disliked’ as a flaw btw, I do mean actual flaws—e.g. arrogance, overconfidence.)
Thanks, I appreciate the spirit with which you’ve approached the conversation. It’s an emotional topic for people I guess.
I agree that this article is net negative, and I would go further: It has a non-trivial chance of irreparably damaging relationships and making the AI Alignment community look like fools, primarily due to the call for violence.
FWIW I think it’s pretty unfair and misleading to characterize what he said as a call for violence.
I’ve been persuaded in the comment threads that I was wrong on Eliezer specifically advocating violence, so I retract my earlier comment.
This is a case where the precautionary principle grants a great deal of rhetorical license. If you think there might be a lion in the bush, do you have a long and nuanced conversation about it, or do you just tell your tribe, “There’s a line in that bush. Back away.”?
X-risks tend to be more complicated beasts than lions in bushes, in that successfully avoiding them requires a lot more than reflexive action: we’re not going to navigate them by avoiding carefully understanding them.
I actually agree entirely. I just don’t think that we need to explore those x-risks by exposing ourselves to them. I think we’ve already advanced AI enough to start understanding and thinking about those x-risks, and an indefinite (perhaps not permanent) pause in development will enable us to get our bearings.
Say what you need to say now to get away from the potential lion. Then back at the campfire, talk it through.
If there were a game-theoretically reliable way to get everyone to pause all together, I’d support it.
Because the bush may have things you need and the pLion is low. There are tradeoffs you are ignoring.
Proposition 1: Powerful systems come with no x-risk
Proposition 2: Powerful systems come with x-risk
You can prove / disprove 2 by proving or disproving 1.
Why is it that a lot of [1,0] people believe that the [0,1] group should prove their case? [1]
And also ignore all the arguments that have been offered.
takes a deep breath
(Epistemic status: vague, ill-formed first impressions.)
So that’s what we’re doing, huh? I suppose EY/MIRI has reached the point where worrying about memetics / optics has become largely a non-concern, in favor of BROADCASTING TO THE WORLD JUST HOW FUCKED WE ARE
I have… complicated thoughts about this. My object-level read of the likely consequences is that I have no idea what the object-level consequences are likely to be, other than that this basically seems to be an attempt at heaving a gigantic rock through the Overton window, for good or for ill. (Maybe AI alignment becomes politicized as a result of this? But perhaps it already has been! And even if not, maybe politicizing it will at least raise awareness, so that it might become a cause area with similar notoriety as e.g. global warming—which appears to have at least succeeded in making token efforts to reduce greenhouse emissions?)
I just don’t know. This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space. So I suppose from his viewpoint, this action does make some sense; I am (however) vaguely annoyed on behalf of other alignment teams, whose jobs I at least mildly predict will get harder as a result of this.
That’s not how I read it. To me it’s an attempt at the simple, obvious strategy of telling people ~all the truth he can about a subject they care a lot about and where he and they have common interests. This doesn’t seem like an attempt to be clever or explore high-variance tails. More like an attempt to explore the obvious strategy, or to follow the obvious bits of common-sense ethics, now that lots of allegedly clever 4-dimensional chess has turned out stupid.
I don’t think what you say Anna contradicts what dxu said. The obvious simple strategy is now being tried, because the galaxy brained strategies don’t seem like they are working; the galaxy-brained strategies seemed lower-variance and more sensible in general at the time, but now they seem less sensible so EY is switching to the higher-variance, less-galaxy-brained strategy.
But it does risk giving up something. Even the average tech person on a forum like Hacker News still thinks the risk of an AI apocalypse is so remote that only a crackpot would take it seriously. Their priors regarding the idea that anyone of sense could take it seriously are so low that any mention of safety seems to them a fig-leaf excuse to monopolize control for financial gain; as believable as Putin’s claims that he’s liberating the Ukraine from Nazis. (See my recent attempt to introduce the idea here .) The average person on the street is even further away from this I think.
The risk then of giving up “optics” is that you lose whatever influence you may have had entirely; you’re labelled a crackpot and nobody takes you seriously. You also risk damaging the influence of other people who are trying to be more conservative. (NB I’m not saying this will happen, but it’s a risk you have to consider.)
For instance, personally I think the reason so few people take AI alignment seriously is that we haven’t actually seen anything all that scary yet. If there were demonstrations of GPT-4, in simulation, murdering people due to mis-alignment, then this sort of a pause would be a much easier sell. Going full-bore “international treaty to control access to GPUs” now introduces the risk that, when GPT-6 is shown to murder people due to mis-alignment, people take it less seriously, because they’ve already decided AI alignment people are all crackpots.
I think the chances of an international treaty to control GPUs at this point is basically zero. I think our best bet for actually getting people to take an AI apocalypse seriously is to demonstrate an un-aligned system harming people (hopefully only in simulation), in a way that people can immediately see could extend to destroying the whole human race if the AI were more capable. (It would also give all those AI researchers something more concrete to do: figure out how to prevent this AI from doing this sort of thing; figure out other ways to get this AI to do something destructive.) Arguing to slow down AI research for other reasons—for instance, to allow society to adapt to the changes we’ve already seen—will give people more time to develop techniques for probing (and perhaps demonstrating) catastrophic alignment failures.
“For instance, personally I think the reason so few people take AI alignment seriously is that we haven’t actually seen anything all that scary yet. “
And if this “actually scary” thing happens, people will know that Yudkowsky wrote the article beforehand, and they will know who the people are that mocked it.
This contradicts the existing polls, which appear to say that everyone outside of your subculture is much more concerned about AGI killing everyone. It looks like if it came to a vote, delaying AGI in some vague way would win by a landslide, and even Eliezer’s proposal might win easily.
Can you give a reference? A quick Google search didn’t turn anything like that up.
Here’s some more:
https://www.monmouth.edu/polling-institute/reports/monmouthpoll_us_021523/
I’ll look for the one that asked about the threat to humanity, and broke down responses by race and gender. In the meantime, here’s a poll showing general unease and bipartisan willingness to legally restrict the use of AI: https://web.archive.org/web/20180109060531/http://www.pewinternet.org/2017/10/04/automation-in-everyday-life/
Plus:
I do note, on the other side, that the general public seems more willing to go Penrose, sometimes expressing or implying a belief in quantum consciousness unprompted. That part is just my own impression.
This may be what I was thinking of, though the data is more ambiguous or self-contradictory: https://www.vox.com/future-perfect/2019/1/9/18174081/fhi-govai-ai-safety-american-public-worried-ai-catastrophe
Thanks for these, I’ll take a look. After your challenge, I tried to think of where my impression came from. I’ve had a number of conversations with relatives on Facebook (including my aunt, who is in her 60′s) about whether GPT “knows” things; but it turns out so far I’ve only had one conversation about the potential of an AI apocalypse (with my sister, who started programming 5 years ago). So I’ll reduce confidence in my assessment re what “people on the street” think, and try to look for more information.
Re HackerNews—one of the tricky things about “taking the temperature” on a forum like that is that you only see the people who post, not the people who are only reading; and unlike here, you only see the scores for your own comments, not those of others. It seems like what I said about alignment did make some connection, based on the up-votes I got; I have no idea how many upvotes the dissenters got, so I have no idea if lots of people agreed with them, or if they were the handful of lone objectors in a sea of people who agreed with me.
I second this.
I think people really get used to discussing things in their research labs or in specific online communities. And then, when they try to interact with the real world and even do politics, they kind of forget how different the real world is.
Simply telling people ~all the truth may work well in some settings (although it’s far from all that matters in any setting) but almost never works well in politics. Sad but true.
I think that Eliezer (and many others including myself!) may be suspectable to “living in the should-universe” (as named by Eliezer himself).
I do not necessarily say that this particular TIME article was a bad idea, but I am feeling that people who communicate about x-risk are on average biased in this way. And it may greatly hinder the results of communication.
I also mostly agree with “people don’t take AI alignment seriously because we haven’t actually seen anything all that scary yet”. However, I think that the scary thing is not necessarily “simulated murders”. For example, a lot of people are quite concerned about unemployment caused by AI. I believe it might change perception significantly if it will actually turn out to be a big problem which seems plausible.
Yes, of course, it is a completely different issue. But on an emotional level, it will be similar (AI == bad stuff happening).
People like Ezra Klein are hearing Eliezer and rolling his position into their own more palatable takes. I really don’t think it’s necessary for everyone to play that game, it seems really good to have someone out there just speaking honestly, even if they’re far on the pessimistic tail, so others can see what’s possible. 4D chess here seems likely to fail.
https://steno.ai/the-ezra-klein-show/my-view-on-ai
Also, there’s the sentiment going around that normies who hear this are actually way more open to the simple AI Safety case than you’d expect, we’ve been extrapolating too much from current critics. Tech people have had years to formulate rationalizations and reassure one another they are clever skeptics for dismissing this stuff. Meanwhile regular folks will often spout off casual proclamations that the world is likely ending due to climate change or social decay or whatever, they seem to err on the side of doomerism as often as the opposite. The fact that Eliezer got published in TIME is already a huge point in favor of his strategy working.
EDIT: Case in point! Met a person tonight, completely offline rural anti-vax astrology doesn’t-follow-the-news type of person, I said the word AI and immediately she says she thinks “robots will eventually take over”. I understand this might not be the level of sophistication we’d desire, but at least be aware that raw material is out there. No idea how it’ll play out, but 4d chess still seems like a mistake, let Yud speak his truth.
This is not a good thing, under my model, given that I don’t agree with doomerism.
You disagree with doomerism as a mindset, or factual likelihood? Or both?
I think doomerism as a mindset isn’t great, but in terms of likelihood, there are ~3 things likely to kill humanity atm. AI being the first.
Both as a mindset and as a factual likelihood.
For mindset, I agree that doomerism isn’t good, primarily because it can close your mind off of real solutions to a problem, and make you over update to the overly pessimistic view.
As a factual statement, I also disagree with high p(Doom) probabilities, and I have a maximum of 10%, if not lower.
For object level arguments for why I disagree with the doom take, here’s the arguments:
I disagree with the assumption of Yudkowskians that certain abstractions just don’t scale well when we crank them up in capabilities. I remember a post that did interpretability on AlphaZero and found it has essentially human interpretable abstractions, which at least for the case of Go disproved that Yudkowskian notion.
I am quite a bit more optimistic on scalable alignment than many in the LW community, and in the case of recent work, showed that as AI got more data, it got more aligned with human goals. There are many other benefits in the recent work, but the fact that they showed that as a certain capability scaled up, alignment scaled up, means that the trend of alignment is positive, and more capable models will probably be more aligned.
Finally, trend lines. There’s a saying that’s inspired by the Atomic Habits book: The trend line matters more than how much progress you make in a single sitting. And in the case of alignment, that trend line is positive but slow, which means we are in a extremely good position to speed up that trend. It also means we should be far less worried about doom, as we just have to increase the trend line of alignment progress and wait.
Edit: My first point is at best, partially correct, and may need to be removed altogether due to a new paper called Adversarial Policies Beat Superhuman Go AIs.
Link below:
https://arxiv.org/abs/2211.00241
All other points stand.
Recent Adversarial Policies Beat Superhuman Go AIs seem to plant doubt how well abstractions generalize in the case of Go.
I’ll admit, that is a fairly big blow to my first point, though the rest of my points stand. I’ll edit the comment to mention your debunking of my first point.
I think that a mindset considered ‘poor’ would imply that it causes one to arrive at false conclusions more often.
If doomerism isn’t a good mindset, it should also—besides making one simply depressed and fearful / pessimistic about the future—be contradicted by empirical data, and the flow of events throughout time.
Personally, I think it’s pretty easy to show that pessimism (belief that certain objectives are impossible or doomed to cause catastrophic, unrecoverable failure) is wrong. Furthermore, and even more easily argued than that, is that belief that one’s objective is unlikely or impossible cannot cause one to be more likely to achieve it. I would define ‘poor’ mindsets to be equivalent to the latter to some significant degree.
That’s a new one!
More seriously: Yep, it’s possible to be making this error on a particular dimension, even if you’re a pessimist on some other dimensions. My current guess would be that Eliezer isn’t making that mistake here, though.
For one thing, the situation is more like “Eliezer thinks he tried the option you’re proposing for a long time and it didn’t work, so now he’s trying something different” (and he’s observed many others trying other things and also failing), rather than “it’s never occurred to Eliezer that LWers are different from non-LWers”.
I think it’s totally possible that Eliezer and I are missing important facts about an important demographic, but from your description I think you’re misunderstanding the TIME article as more naive and less based-on-an-underlying-complicated-model than is actually the case.
I specifically said “I do not necessarily say that this particular TIME article was a bad idea” mainly because I assumed it probably wasn’t that naive. Sorry I didn’t make it clear enough.
I still decided to comment because I think this is pretty important in general, even if somewhat obvious. Looks like one of those biases which show up over and over again even if you try pretty hard to correct it.
Also, I think it’s pretty hard to judge what works and what doesn’t. The vibe has shifted a lot even in the last 6 months. I think it is plausible it shifted more than in a 10-year period 2010-2019.
I think this is the big disagreement I have. I do think the alignment community is working, and in general I think the trend of alignment is positive. We haven’t solved the problems, but were quite a bit closer to the solution than 10 years ago.
The only question was whether LW and the intentional creation of an alignment community was necessary, or was the alignment problem going to be solved without intentionally creating LW and a field of alignment research.
I mean, I could agree with those two claims but think the trendlines suggest we’ll have alignment solved in 200 years and superintelligent capabilities in 14 years. I guess it depends on what you mean by “quite a bit closer”; I think we’ve written up some useful semiformal descriptions of some important high-level aspects of the problem (like ‘Risks from Learned Optimization’), but this seems very far from ‘the central difficulties look 10% more solved now’, and solving 10% of the problem in 10 years is not enough!
(Of course, progress can be nonlinear—the last ten years were quite slow IMO, but that doesn’t mean the next ten years must be similarly slow. But that’s a different argument for optimism than ‘naively extrapolating the trendline suggests we’ll solve this in time’.)
I disagree, though you’re right that my initial arguments weren’t enough.
To talk about the alignment progress we’ve achieved so far, here’s a list:
We finally managed to solve the problem of deceptive alignment while being capabilities competitive. In particular, we figured out a goal that is both more outer aligned than the Maximum Likelihood Estimation goal that LLMs use, and critically it is a myopic goal, meaning we can avoid deceptive alignment even at arbitrarily high capabilities.
The more data we give to the AI, the more aligned the AI is, which is huge in the sense that we can reliably get AI to be more aligned as it’s more capable, vindicating the scalable alignment agenda.
The training method doesn’t allow the AI to affect it’s own distribution, unlike online learning, where the AI selects all the data points to learn, and thus can’t shift the distribution nor gradient hack.
As far as how much progress? I’d say this is probably 50-70% of the way there, primarily because we finally are figuring out ways to deal with core problems of alignment like deceptive alignment or outer alignment of goals without too much alignment taxes.
“We finally managed to solve the problem of deceptive alignment while being capabilities competitive”
??????
Good question to ask, and I’ll explain.
So one of the prerequisites of deceptive alignment is that it optimizes for non-myopic goals. In particular, these are goals that are about the long-term.
So in order to avoid deceptive alignment, one must find a goal that is both myopic and ideally scales to arbitrary capabilities.
And in a sense, that’s what Pretraining from Human Feedback found, in that the goal of cross-entropy from a feedback-annotated webtext distribution is a myopic goal, and it’s either on the capabilities frontier or outright the optimal goal for AIs. In particular, they have way less alignment taxes than other schemes.
In essence, the goal avoids deceptive alignment by removing one of the prerequisites of deceptive alignment. At the very least, it doesn’t incentivized deceptive alignment.
You seem to be conflating myopic training with myopic cognition.
Myopic training is not sufficient to ensure myopic cognition.
I think you’ll find near universal agreement among alignment researchers that deceptive alignment hasn’t been solved. (I’d say “universal” if I weren’t worried about true Scottsmen)
I do think you’ll find agreement that there are approaches where deceptive alignment seems less likely (here I note that 99% is less likely than 99.999%). This is a case Evan makes in the conditioning predictive models approach.
However, the case there isn’t that the training goal is myopic, but rather that it’s simple, so it’s a little more plausible that a model doing the ‘right’ thing is found by a training process before a model that’s deceptively aligned.
I agree that this is better than nothing, but “We finally managed to solve the problem of deceptive alignment...” is just false.
I agree, which is why I retracted my comments about deceptive alignment being solved, though I do think it’s still far better to not have incentives to be non-myopic than to have such incentives in play.
It does help in some respects.
On the other hand, a system without any non-myopic goals also will not help to prevent catastrophic side-effects. If a system were intent-aligned at the top level, we could trust that it’d have the motivation to ensure any of its internal processes were sufficiently aligned, and that its output wouldn’t cause catastrophe (e.g. it wouldn’t give us a correct answer/prediction containing information it knew would be extremely harmful).
If a system only does myopic prediction, then we have to manually ensure that nothing of this kind occurs—no misaligned subsystems, no misaligned agents created, no correct-but-catastrophic outputs....
I still think it makes sense to explore in this direction, but it seems to be in the category [temporary hack that might work long enough to help us do alignment work, if we’re careful] rather than [early version of scalable alignment solution]. (though a principled hack, as hacks go)
To relate this to your initial point about progress on the overall problem, this doesn’t seem to be much evidence that we’re making progress—just that we might be closer to building a tool that may help us make progress.
That’s still great—only it doesn’t tell us much about the difficulty of the real problem.
It seems to be historically the case that “doomers” or “near-doomers” (public figures who espouse pessimistic views of the future, often with calls for collective drastic actions) do not always come out with a positive public perception when doom or near-doom is perceived not to occurred, or to have occurred far away from what was predicted.
Doomers seem to have a trajectory rather than a distribution, per se. From my perspective, this is on-trajectory. He believed doom was possible, now he believes it is probable.
I’m not sure how long it will be until we get past the “doom didn’t happen” point. Assuming he exists in the future, Eliezer_future lives in the world in which he was wrong. It’s not obvious to me that Eliezer_future exists with more probability the more Eliezer_current believes Eliezer_future doesn’t exist.
Personally, I think Eliezer’s article is actually just great for trying to get real policy change to happen here. It’s not clear to me why Eliezer saying this would make anything harder for other policy proposals. (Not that I agree with everything he said, I just think it was good that he said it.)
I am much more conflicted about the FLI letter; it’s particular policy proscription seems not great to me and I worry it makes us look pretty bad if we try approximately the same thing again with a better policy proscription after this one fails, which is approximately what I expect we’ll need to do.
(Though to be fair this is as someone who’s also very much on the pessimistic side and so tends to like variance.)
It would’ve been even better for this to happen long before the year of the prediction mentioned in this old blog-post, but this is better than nothing.
I think this is probably right. When all hope is gone, try just telling people the truth and see what happens. I don’t expect it will work, I don’t expect Eliezer expects it to work, but it may be our last chance to stop it.
One quote I expect to be potentially inflammatory / controversial:
I’ll remark that this is not in any way a call for violence or even military escalation.
Multinational treaties (about nukes, chemical weapons, national borders, whatever), with clear boundaries and understanding of how they will be enforced on all sides, are generally understood as a good way of decreasing the likelihood of conflicts over these issues escalating to actual shooting.
Of course, potential treaty violations should be interpreted charitably, but enforced firmly according to their terms, if you want your treaties to actually mean anything. This has not always happened for historical treaties, but my gut sense is that on the balance, the existence of multinational treaties has been a net-positive in reducing global conflict.
It is absolutely a call for violence.
He says if a “country outside the agreement” builds a GPU cluster, then some country should be be willing to destroy that cluster by airstrike. That is not about enforcing agreements. That means enforcing one’s will unilaterally on a non-treaty nation—someone not a party to a multinational treaty.
“Hey bro, we decided if you collect more than 10 H100s we’ll bomb you” is about as clearly violence as “Your money or your life.”
Say you think violence is justified, if that’s what you think. Don’t give me this “nah, airstrikes aren’t violence” garbage.
Strictly speaking it is a (conditional) “call for violence”, but we often reserve that phrase for atypical or extreme cases rather than the normal tools of international relations. It is no more a “call for violence” than treaties banning the use of chemical weapons (which the mainstream is okay with), for example.
Yeah, this comment seemed technically true but seems misleading with regards to how people actually use words
It is advocating that we treat it as the class-of-treaty we consider nuclear treaties, and yes that involves violence, but “calls for violence” just means something else.
The use of violence in case of violations of the NPT treaty has been fairly limited and highly questionable in international law. And, in fact, calls for such violence are very much frowned upon because of fear they have a tendency to lead to full scale war.
No one has ever seriously suggested violence as a response to potential violation of the various other nuclear arms control treaties.
No one has ever seriously suggested running a risk of nuclear exchange to prevent a potential treaty violation. So, what Yudkowsky is suggesting is very different than how treaty violations are usually handled.
Given Yudkowsky’s view that the continued development of AI has an essentially 100% probability of killing all human beings, his view makes total sense—but he is explicitly advocating for violence up to and including acts of war. (His objections to individual violence mostly appear to relate to such violence being ineffective.)
How exactly do you come to “up to and including acts of war”? His writing here was concise due to it being TIME, which meant he probably couldn’t caveat things in the way that protects him against EAs/Rationalists picking apart his individual claims bit by bit. But from what I understand of Yudkowsky, he doesn’t seem to in spirit necessarily support an act of war here, largely I think for similar reasons as you mention below for individual violence, as the negative effects of this action may be larger than the positive and thus make it somewhat ineffective.
It’s a call for preemptive war; or rather, it’s a call to establish unprecedented norms that would likely lead to a preemptive war if other nations don’t like the terms of the agreement. I think advocating a preemptive war is well-described as “a call for violence” even if it’s common for mainstream people to make such calls. For example, I think calling for an invasion of Iraq in 2003 was unambiguously a call for violence, even though it was done under the justification of preemptive self-defense.
Also, there is a big difference between “Calling for violence”, and “calling for the establishment of an international treaty, which is to be enforced by violence if necessary”. I don’t understand why so many people are muddling this distinction.
It seems like this makes all proposed criminalization of activities punished by death penalty a call for violence?
Yes! Particularly if it’s an activity people currently do. Promoting death penalty for women who get abortion is calling for violence against women; promoting death penalty for apostasy from Islam is calling for violence against ex-apostates. I think if a country is contemplating passing a law to kill rapists, and someone says “yeah, that would be a great fuckin law” they are calling for violence against rapists, whether or not it is justified.
I don’t really care whether something occurs beneath the auspices of supposed international law. Saying “this coordinated violence is good and worthy” is still saying “this violence is good and worthy.” If you call for a droning in Pakistan, and a droning in Pakistan occurs and kills someone, what were you calling for if not violence.
Meh, we all agree on what’s going on here, in terms of concrete acts being advocated and I hate arguments over denotation. If “calling for violence” is objectionable, “Yud wants states to coordinate to destroy large GPU clusters, potentially killing people and risking retaliatory killing up to the point of nuclear war killing millions, if other states don’t obey the will of the more powerful states, because he thinks even killing some millions of people is a worthwhile trade to save mankind from being killed by AI down the line” is, I think, very literally what is going on. When I read that it sounds like calling for violence, but, like, dunno.
The thing I’m pretty worried about here is people running around saying ‘Eliezer advocated violence’, and people hearing ‘unilaterally bomb data centers’ rather than ’build an international coalition that enforces a treaty similar to how we treat nuclear weapons and bioweapons, and enforce it.”
I hear you saying (and agree with) “guys you should not be oblivious to the fact that this involves willingness to use nuclear weapons” Yes I agree very much it’s important to stare that in the face.
But “a call for willingness to use violence by state actors” is just pretty different from “a call for violence”. Simpler messages move faster than more nuanced messages. Going out of your way to accelerate simple and wrong conceptions of what’s going on doesn’t seem like it’s helping anyone.
It is rare to start wars over arms treaty violations. The proposal considered here—if taken seriously—would not be an ordinary enforcement action but rather a significant breach of sovereignty almost without precedent within this context. I think it’s reasonable to consider calls for preemptive war extremely seriously, and treat it very differently than if one had proposed e.g. an ordinary federal law.
I’m specifically talking about the reference class of nuclear and bioweapons, which do sometimes involve invasion or threat-of-invasion of sovereign states. I agree that’s really rare, something we should not do lightly.
But I don’t think you even need Eliezer-levels-of-P(doom) to think the situation warrants that sort of treatment. The most optimistic people I know of who seem to understand the core arguments say things like “10% x-risk this century”, which I think is greater than x-risk likelihood from nuclear war.
I agree with this. I find it very weird to imagine that “10% x-risk this century” versus “90% x-risk this century” could be a crux here. (And maybe it’s not, and people with those two views in fact mostly agree about governance questions like this.)
Something I wouldn’t find weird is if specific causal models of “how do we get out of this mess” predict more vs. less utility for state interference. E.g., maybe you think 10% risk is scarily high and a sane world would respond to large ML training runs way more aggressively than it responds to nascent nuclear programs, but you also note that the world is not sane, and you suspect that government involvement will just make the situation even worse in expectation.
If nuclear war occurs over alignment, then in the future people are likely to think about “alignment” much much worse than people currently think about words like “eugenics,” for reasons actually even better than the ones people currently dislike “eugenics.” Additionally, I don’t think it will get easier to coordinate post nuclear war, in general; I think it probably takes us closer to a post-dream-time setting, in the Hansonian sense. So—obviously predicting the aftermath of nuclear war is super chaotic, but my estimate of % of future light-cone utilized does down—and if alignment caused the nuclear war, it should go down even further on models which judge alignment to be important!
This is a complex / chaotic / somewhat impossible calculation of course. But people seem to be talking about nuclear war like it’s a P(doom)-from-AI-risk reset button, and not realizing that there’s an implicit judgement about future probabilities that they are making. Nuclear war isn’t the end of history but another event whose consequences you can keep thinking about.
(Also, we aren’t gods, and EV is by fucking golly the wrong way to model this, but, different convo)
It makes me… surprised? feeling sadly that I don’t understand you?… to read you think a floor is 10% after reading Quintin Pope’s summary of disagreements with EY. His guess was 5%, and his theories seem way more clear, predictive and articulable than EY’s.
I’m unaware of any prior-to-the-end-of-the-world predictions about intelligence that—for lack of a better word—any classical EY / MIRI theory makes, despite the mountains of arguments in that theory. (Contrast with shard theory, which seems to have a functioning research program.) It makes a lot of predictions about superintelligence, but like, none about the Cambrain explosion of intelligence in which we live. I imagine this is a standard objection that you’ve heard before, Raemon, and I know you talk with a lot of people about it, so like, if there’s a standard response (sub 100k words) you point me at it.… but I think superintelligence will be built on the bits of intelligence we’ve made, and if your theory isn’t predictive about those (in the easiest time in history to make predictions about intelligence!) then it’s like, tantamount to the greatest possible amount of evidence that it’s a bad theory. I think there’s a lot of philosophy here that failed to turn into science, and at this point it’s just… philosophy, in the worst possible sense.
And EY’s “shut it all down” seems really driven by classical MIRI theory, in a lot of ways some of which I think I’m pretty sure of and some of which I struggle to articulate. I might have to follow the Nostalgebrist (https://nostalgebraist.tumblr.com/post/712173910926524416/im-re-instating-this-again-31823#notes) and just stop visiting LW because like… I think so much is wrong in the discourse about AI here, I dunno.
I agree pretty strongly with your points here especially the complete lack of good predictions from EY/MIRI about the current Cambrian explosion of intelligence and how any sane agent using a sane updating strategy (like mixture of experts or equivalently solomonof weighting) should more or less now discount/disavow much of their world model.
However I nonetheless agree that AI is by far the dominant x-risk. My doom probability is closer to ~5% perhaps, but the difference between 5% and 50% doesn’t cash out to much policy difference at this point.
So really my disagreement is more on alignment strategy. A problem with this site is that it overweights EY/MIRI classic old alignment literature and arguments by about 100x what it should be, and is arguably doing more harm than good by overpromoting those ideas vs alternate ideas flowing from those who actually did make reasonably good predictions about the current cambrian explosion—in advance.
If there was another site that was a nexus for AI/risk/alignment/etc with similar features but with most of the EY/MIRI legacy cultish stuff removed, I would naturally jump there. But it doesn’t seem to exist yet.
I don’t think there are many people with alignment strategies and research that they’re working on. Eliezer has a hugely important perspective, Scott Garrabrant, Paul Christiano, John Wentworth, Steve Byrnes, and more, all have approaches and perspectives too that they’re working full-time on. I think if you’re working on this full-time and any of your particular ideas check out as plausible I think there’s space for you to post here and get some engagement respect (if you post in a readable style that isn’t that of obfuscatory-academia). If you’ve got work you’re doing on it full-time I think you can probably post here semi-regularly and eventually find collaborators and people you’re interested in feedback from and eventually funding. You might not get super high karma all the time, but that’s okay, I think a few well-received posts is enough to not have to worry about a bunch of low-karma posts.
The main thing that I think makes space for a perspective here is (a) someone is seriously committed to actually working on it, and (b) they can communicate clearly and well. There’s a lot of different sub-niches on LessWrong that co-exist (e.g. Zvi’s news discussion doesn’t interact with Paul’s Latent Knowledge discussion doesn’t (surprisingly) interact much with Flint’s writing on what knowledge isn’t which doesn’t interact much with Kokotajlo’s writing on takeover). I think it’s fine to develop an area of research here without justifying the whole thing the whole time, I think that’s healthy for paradigms and proposals to go away and not have to engage that much with each other until they’ve made more progress. Overall I think most paradigms here have no results to show for themselves and it is not that worth fighting over which strategy to pick, rather than working ahead on a given strategy for a year or two until you have something to report back. For instance I would mostly encourage Quintin to go and get a serious result in shard theory and bring that back (and I really like that TurnTrout and Quintin have been working seriously on exactly that) and spend less time arguing about which approach is better.
I agree that’s a problem—but causally downstream of the problem I mention. Whereas Bostrom deserves credit for raising awareness of AI-risk in academia, EY/MIRI deserves credit for awakening many young techies to the issue—but also some blame.
Whether intentionally or not, the EY/MIRI worldview aligned itself against DL and its proponents, leading to an antagonistic dynamic that you may not have experienced if you haven’t spent much time on r/MachineLearning or similar. Many people in ML truly hate anything associated with EY/MIRI/LW. Part of that is perhaps just the natural result of someone sounding an alarm that your life’s work could literally kill everyone. But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.
I otherwise agree with much of your comment. I think this site is lucky to have Byrnes and Quintin, and Quintin’s recent critique is the best recent critique of the EY/MIRI position from the DL perspective.
I have not engaged much with your and Quintin’s recent arguments about how deep learning may change the basic arguments, so I want to acknowledge that I would probably shift my opinion a bunch in some direction if I did. Nonetheless, a few related points:
I do want to say that on-priors the level of anger and antagonism that appears on most internet comment sections is substantially higher than what happens when the people meet in-person, and do not suspect a corresponding about of active antagonism would happen if Nate or Eliezer or John Wentworth went to an ML conference. Perhaps stated more strongly: I think 99% of internet ‘hate’ is performative only.
You write “But it really really doesn’t help if you then look into their technical arguments and reach the conclusion that they don’t know what they are talking about.” I would respect any ML researchers making this claim more if they wrote a thoughtful rebuttal to AGI: A List of Lethalities (or really literally any substantive piece of Eliezer’s on the subject that they cared to — There’s No Fire Alarm, Security Mindset, Rocket Alignment, etc). I think Eliezer not knowing what he’s talking about would make rebutting him easier. As far as I’m aware literally zero significant ML researchers have written such a thing, Not Dario, not Demis, not Sutskever, not LeCun, nor basically anyone senior in their orgs. Eliezer has thought quite a lot and put forth some quite serious argument that seemed shockingly prescient to me, and I dunno, it seems maximally inconvenient for all the people earning multi-million-dollar annual salaries in this new field of ML to seriously to engage with a good-faith and prescient outsider with thoughtful arguments that their work risks extinction. If they’re dismissing him as “not getting it” yet don’t seriously engage with the arguments or make a positive case for how alignment can be solved, I think I ought to default to thinking of them as not morally serious in their statements. Relatedly I am pretty deeply disappointed by the speed at which intellectuals like Pinker and Cowen quickly come up with reasons to dismiss and avoid engaging with the arguments when the alternative is to seriously grapple with an extinction-level threat.
I am not compelled by the idea that if you haven’t restated your arguments to fit in with the new paradigm that’s shown up, then you must be out of the loop and wrong. Rather than “your arguments don’t seem perfectly suited to our new paradigm, look at all of these little holes I’ve found” I would be far more compelled by “here is a positive proposal for how to align a system that we build, with active reason to suspect it is aligned” or similar. Paul is the only person I know to propose specific algorithms for how to align systems and Eliezer has engaged seriously on Paul’s terms and found many holes in the proposal that Paul agreed with. I expect Eliezer would do the same if anyone working in the major labs did the same.
I understand that you and Quintin have criticisms (looking through bits of Quintin’s post it seems interesting, as do your claims here) as does Paul and others who all agree on the basics that this is an extinction-level threat, I think it is more productive for Eliezer to critique positive proposals than it is to update his arguments identifying the problem, especially when defending them from criticism from people who still think the extinction risk from misalignment is at least 5% and thus a top priority for civilization right now. If there was a leading ML practitioner arguing that ML was not an extinction-level threat and who was engaged with Eliezer’s arguments, I would consider it more worthwhile for Eliezer to respond. Meanwhile I think people working in alignment research should prefer to get on with the work at-hand, and that LessWrong is clearly the best forum to get engagement from people who understand what the problem is that is trying to actually be solved (and to find collaborators/funders/etc).
I just want to point out that seems like a ridiculous standard. Quintin’s recent critique is not that dissimilar to the one I would write (and I already have spent some time trying to point out the various flaws in the EY/MIRI world model), and I expect that you would get many of the same objections if you elicited a number of thoughtful DL researchers. But few if any have been motivated—what’s the point?
Here’s my critique in simplified form: the mainstream AI futurists (moravec,kurzweil,etc) predicted that AGI would be brain-like and thus close to a virtual brain emulation. Thus they were not so concerned about doom, because brain-like AGI seems like a more natural extension of humanity (moravec’s book is named ‘mind children’ for a reason), and an easier transition to manage.
In most ways that matter, Moravec/Kurzweil were correct, and EY was wrong. That really shouldn’t be even up for debate at this point. The approach that worked—DL—is essentially reverse engineering the brain. This is in part due to how the successful techniques all ended up being directly inspired by neuroscience and the now proven universal learning & scaling hypotheses[1] (deep and or recurrent ANNs in general, sparse coding, normalization, relus, etc) OR indirectly recapitulated neural circuitry (transformer ‘attention’ equivalence to fast weight memory, etc).
But in even simpler form: If you take a first already trained NN A and run it on a bunch of data and capture all its outputs, then train a second NN B on the input output dataset, the result is that B becomes a distilled copy—a distillation, of A.
This is in fact how we train large scale AI systems. They are trained on human thoughts.
The universal learning hypothesis is that the brain (and thus DL) uses simple universal learning algorithms, and all circuit content is learned automatically, which leads to the scaling hypothesis—intelligence comes from scaling up simple architectures and learning algorithms with massive compute, not continually explicitly “rewriting your source code” ala EY’s model.
Can I ask what your epistemic state here is exactly? Here are some options:
The arguments Eliezer put forward do not clearly apply to Deep Learning and therefore we don’t have any positive reason to believe that alignment will be an issue in ML
The arguments Eliezer put forward never made sense in the first place and therefore we do not have to worry about the alignment problem
The arguments Eliezer put forward captured a bunch of important things about the alignment problem but due to some differences in how we get to build ML systems we actually know of a promising route to aligning the systems
The arguments Eliezer put forward are basically accurate but with concepts that feel slightly odd for thinking about machine learning, and due to machine learning advances we have a concrete (and important) research route that seems worth investing in that Eliezer’s conceptual landscape doesn’t notice and that he is pushing against
Yes but
does not follow.
Yes (for some of the arguments), but again:
does not follow.
Yes—such as the various more neuroscience/DL inspired approaches (Byrnes, simboxes, shard theory, etc.), or others a bit harder to categorize like davdidad’s approach, or external empowerment.
But also I should point out that RLHF may work better for longer than most here anticipate, simply because if you distill the (curated) thoughts of mostly aligned humans you may just get mostly aligned agents.
Thanks!
I’m not sure if it’s worth us having more back-and-forth, so I’ll say my general feelings right now:
I think it’s of course healthy and fine to have a bunch of major disagreements with Eliezer
I would avoid building “hate” toward him or building resentment as those things are generally not healthy for people to cultivate in themselves toward people who have not done evil things, as I think it will probably cause them to make worse choices by their own judgment
By-default do not count on anyone doing the hard work of making another forum for serious discussion of this subject, especially one that’s so open to harsh criticism and has high standards for comments (I know LessWrong could be better in lots of ways but c’mon have you seen Reddit/Facebook/Twitter?)
There is definitely a bunch of space on this forum for people like yourself to develop different research proposals and find thoughtful collaborators and get input from smart people who care about the problem you’re trying to solve (I think Shard Theory is such an example here)
I wish you every luck in doing so and am happy to know if there are ways to further support you trying to solve the alignment problem (of course I have limits on my time/resources and how much I can help out different people)
Of course—my use of the word hate here is merely in reporting impressions from other ML/DL forums and the schism between the communities.
I obviously generally agree with EY on many things, and to the extent I critique his positions here its simply a straightforward result of some people here assuming their correctness a priori.
Okay! Good to know we concur on this. Was a bit worried, so thought I’d mention it.
Also, can I just remind you that for most of LessWrong’s history the top-karma post was Holden’s critique of SingInst where he recommended against funding SingInst and argued in favor of Tool AI as the solution. Recently Eliezer’s List-of-Lethalities became the top-karma post, but less than a month later Paul’s response-and-critique post became the top-karma post where he argued that the problem is much more tractable than Eliezer thinks, and generally advocates a very different research strategy for dealing with alignment.
Eliezer is the primary person responsible for noticing and causing people to work on the alignment problem, due to his superior foresight and writing skill, and also founded this site, so most people here have read his perspective and understand it somewhat, but any notion that dissent isn’t welcomed here (which I am perhaps over-reading into your comment) seems kind of obviously not the case.
The main answer here is I hadn’t read Quintin’s post in full detail and didn’t know that. I’ll want to read it in more detail but mostly expect to update my statement to “5%”. Thank you for pointing it out.
(I was aware of Scott Aaronson being like 3%, but honestly hadn’t been very impressed with his reasoning and understanding and was explicitly not counting him. Sorry Scott).
I have more thoughts on where my own P(Doom) comes from, and how I relate to all this, but I think basically I should write a top level post about it and take some time to get it well articulated. I think I already said, but a quick recap: I don’t think you need particularly Yudkowskian views to think an international shut down treaty is a good idea. My own P(Doom) is somewhat confused but I put >50% odds. A major reason is the additional disjunctive worries of “you don’t just need the first superintelligence to go well, you need a world with lots of strong-but-narrow AIs interacting to go well, or a multipolar take off to go well.
Sooner or later you definitely need something about as strict (well, more actually) as the global control Eliezer advocates here, since compute costs go down, compute itself goes up, and AI models become more accessible and more powerful. Even if alignment is easy I don’t see how you can expect to survive an AI-heavy world without a level of control and international alignment that feels draconian by today’s standards.
(I don’t know yet if Quinton argues against all these points, but will give it a read. I haven’t been keeping up with everything because there’s a lot to read but seems important to be familiar with his take)
But maybe for right now maybe I most want to say “Yeah man this is very intense and sad. It sounds like I disagree with your epistemic state but I don’t think your epistemic state is crazy.”
I hope you do, since these might reveal cruxes about AI safety, and I might agree or disagree with the post you write.
I don’t blame you if you leave LW, though I do want to mention that Eliezer is mostly the problem here, rather than a broader problem of LW.
That stated, LW probably needs to disaffiliate from Eliezer fast, because Eliezer is the source of the extreme rhetoric.
“But I don’t think you even need Eliezer-levels-of-P(doom) to think the situation warrants that sort of treatment.”
Agreed. If a new state develops nuclear weapons, this isn’t even close to creating a 10% x-risk, yet the idea of airstrikes on nuclear enrichment facillities, even though it is very controversial, has for a long time very much been an option on the table.
FWIW I also have >10% credence on x-risk this century, but below 1% on x-risk from an individual AI system trained in the next five years, in the sense Eliezer means it (probably well below 1% but I don’t trust that I can make calibrated estimates on complex questions at that level). That may help explain why I am talking about this policy in these harsh terms.
I, too, believe that absolute sovereignty of all countries on Earth is more important than the existence of the planet itself.
You’re assuming I agree with the premise. I don’t. I don’t think that bombing GPU clusters in other countries will help much to advance AI safety, so I don’t think the conclusion follows from the premise.
I agree with the principle that if X is overwhelmingly important and Y achieves X, then we should do Y, but the weak point of the argument is that Y achieves X. I do not think it does. You should respond to the argument that I’m actually saying.
Kind of already happened: uggcf://jjj.ivpr.pbz/ra/negvpyr/nx3qxw/nv-gurbevfg-fnlf-ahpyrne-jne-cersrenoyr-gb-qrirybcvat-nqinaprq-nv (https://rot13.com/, because I don’t mean to amplify this too much.)
You are muddling the meaning of “pre-emptive war”, or even “war”. I’m not trying to diminish the gravity of Yudkowsky’s proposal, but a missile strike on a specific compound known to contain WMD-developing technology is not a “pre-emptive war” or “war”. Again I’m not trying to diminish the gravity, but this seems like an incorrect use of the term.
I think (this kind) of violence is justified. Most people support some degree of state violence. I don’t think it’s breaching any reasonable deontology for governments to try to prevent a rogue dictator from building something, in violation of a clear international treaty, that might kill many more people than the actual airstrike would kill. It’s not evil (IMO) when Israel airstrikes Iranian enrichment facilities, for example.
I applaud your clarity if not your policy.
I think multinational agreements (about anything) between existing military powers, backed by credible threat of enforcement are likely to lead to fewer actual airstrikes, not more.
I do actually think there is an important difference between nation states coercing other nation states through threat of force, and individuals coercing or threatening individuals. Calling the former “violence” seems close to the non-central fallacy, especially when (I claim) it results in fewer actual people getting injured by airstrikes or war or guns, which is what I think of as a central example of actual violence.
Yes, this is the main words: “be willing to destroy a rogue datacenter by airstrike.”
Such data center will likely be either in China or Russia. And there are several of them there. Strike on them will likely cause a nuclear war.
I think the scenario is that all the big powers agree to this, and agree to enforce it on everyone else.
If that were the case, then enforcing the policy would not “run some risk of nuclear exchange”. I suggest everyone read the passage again. He’s advocating for bombing datacentres, even if they are in russia or china.
OK, I guess I was projecting how I would imagine such a scenario working, i.e. through the UN Security Council, thanks to a consensus among the big powers. The Nuclear Non-Proliferation Treaty seems to be the main precedent, except that the NNPT allows for the permanent members to keep their nuclear weapons for now, whereas an AGI Prevention Treaty would have to include a compact among the enforcing powers to not develop AGI themselves.
UN engagement with the topic of AI seems slender, and the idea that AI is a threat to the survival of the human race does not seem to be on their radar at all. Maybe the G-20′s weirdly named “supreme audit institution” is another place where the topic could first gain traction at the official inter-governmental level.
Yes, it is.
I work in a datacenter sometimes. No AI has suggested to kill me yet / to destroy my place of work, and the author did.
I think the proposal involves data centers becoming illegal (ideally both domestically & by international treaty) first, issuing warnings etc second, and bombing them third, and only when the other methods fail. Nobody is proposing to start a surprise campaign of bombing data centers out of the blue.
It would be more reasonable to say that Eliezer is proposing to put you out of work. Just like, if Person X has a job in a secret lab developing bioweapons, I and many other people would very much like to put that person out of work. They can find a different job, right? I wouldn’t particularly relish Person X dying as a side-effect of their place-of-work getting bombed—every death is a tragedy—but I do think that if someone proposes an international treaty against bioweapons labs, and the treaty has an enforcement mechanism that involves sometimes bombing bioweapons labs as a last resort, then I’m quite possibly in favor of such a treaty, depending on details.
But in reality it’s just a nuclear war. Any scenario where say some countries agree to this but another nuclear armed country decides that world conquest sounds pretty good, they will just build AGI.
Nobody will be universally convinced of the danger until we first actually build AGI, have it go rogue, shut it down, build another....
If 1000 tries later it proves to be impossible to prevent then sure, engineers might give up then.
We have built zero and the systems we build now are generally helpful but can be tricked into saying bad words or giving information that is generally readily available.
I think the lesson of history is that strong multilateral or bilateral agreement, backed by credible threat of enforcement, is in fact sufficient to dissuade nation states from taking actions in their own self-interest.
An agreement on AI is potentially easier to make than one on dissuading a country from attempting to e.g. expand its national borders, since there’s another avenue: convince them of the true fact that building an unaligned AGI is not actually in their self-interest. If it’s impossible to actually convince anyone of this fact, then this just degenerates back to the case of dissuading them from taking actions that are genuinely in their own self-interest, which has happened over the course of history.
I think cold war incentives with regards to tech development were atypical. Building 1000′s of ICBMs was incredibly costly, neither side derived any benefit from it, it was simply defensive matching to maintain MAD, both sides were strongly motivated to enable mechanisms to reduce numbers and costs (START treaties).
This is clearly not the case with AI—which is far cheaper to develop, easier to hide, and has myriad lucrative use cases. Policing a Dune-style “thou shalt not make a machine in the likeness of a human mind” Butlerian Jihad (interesting aside; Samuel Butler was a 19th century anti-industrialisation philosopher/shepard who lived at Erewhon in NZ (nowhere backwards) a river valley that featured as Edoras in the LOTR trilogy) would require radical openness to inspection everywhere all the time, that almost certainly won’t be feasible without establishment of liberal democracy basically everywhere in the world. Despots would be a magnet for rule breakers.
Do you have an example?
I was thinking of the lack of large-scale wars of territorial expansion in the post-Cold War era, relative to all other times in history. (The war in Ukraine is an alarming reversal of that trend.)
So first of all, most of eastern Europe was under Soviet control and only a cold war and the limits of communist empire building prevented it from being all of Europe.
Second, Ukraine is what happens when you don’t have enough guns to protect yourself or allies.
Once a party reaches a certain amount of AGI capability, all the world becomes underprotected.
Finally I see no evidence of any prevention of internal or secret bad acts.
Fox News’ Peter Doocy uses all his time at the White House press briefing to ask about an assessment that “literally everyone on Earth will die” because of artificial intelligence: “It sounds crazy, but is it?”
https://twitter.com/therecount/status/1641526864626720774
I live in the physical world. For a computer program to kill me, it has to have power over the physical world and some physical mechanism to do that. So, anyone claiming that AI is going to destroy humanity needs to explain the physical mechanism by which that will happen. This article like every other one I have seen making that argument fails to do that.
See if this one resonates with you: https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/
No it doesn’t. It is just more of the same nonsense. “AI could defeat all of humanity” but never explains how that happens. I think what is going on here is very intelligent people are thinking about these things. Being intelligent, their blind spot is to grossly over estimate the importance of raw intelligence. So, they AI as being more intelligent than all of humanity and then immediately assume that means it will defeat and enslave humanity as if intelligence were the only thing that mattered. It isn’t the only thing that matters. The physical and brute force matters too. Smart people have a bad habit of forgetting that.
I don’t think reasoning about others’ beliefs and thoughts is helping you be correct about the world here. Can you instead try to engage with the arguments themselves and point out at what step you you don’t see a concrete way for that to happen ?
You don’t show much sign of having read the article so I’ll copy paste the part with explanations of how AIs start acting in the physical space.
So is there anything here you don’t think is possible ?
Getting human allies ? Being in control of large sums of compute while staying undercover ? Doing science, and getting human contractors/allies to produce the results ? etc
The way you use intelligence is different from how many people here using that word mean it.
Check this out (for a partial understanding of what they mean): https://www.lesswrong.com/posts/aiQabnugDhcrFtr9n/the-power-of-intelligence
The commenter you’re responding to mentioned physical and brute force, so I don’t think the understanding of intelligence is the crux.
The can and the will are separate arguments, but the case has been made for both.
One likely way AI kills humanity is indirectly, by simply outcompeting us. They become more intelligent, their consciousness is recognized in at least some jurisdictions, those jurisdictions experience rapid unprecedented technological and economic growth and become the new superpowers, less and less of world GDP goes to humans, we diminish.
One of the simplest ways for AI to have power over the physical world is via humans as pawns. A reasonably savvy AI could persuade/manipulate/coerce/extort/blackmail real-life people to carry out the things it needs help with. Imagine a powerful mob boss who is superintelligent, never sleeps, and continuously monitors everyone in their network.
For superintelligent AI, it will be trivial to orchestrate engineered superpandemics that will kill 90+% of people, finishing off the disorganised rest will be easy.
Oh really? Will it have the ability to run an entire lab robotically to do that? If not, then it won’t be the AI doing anything. It will be the people doing it. Its power to do anything in the physical world only exists to the extent humans are willing to grant it.
There are can order at least 10k-basepair DNA synthesis online, longer sequences are “call to get a quote” on the sites I found. The smallest synthetic genome for a viable self-replicating bacterium is 531kb. The genome for a virus would be even smaller.
My understanding is that there are existing processes to encapsulate genes into virus shells from other species for gene therapy purposes. That leaves the logistics of buying both services, hooking them up and getting the particles injected into some lab animals.
It doesn’t look trivial, but less complicated than buying an entire nuclear arsenal.
Here’s a comment from r/controlproblem with feedback on this article (plus tips for outreach in general) that I thought was very helpful.
Where’s the lie?
More generally, if this is the least radical policy that Eliezer thinks would actually work, then this is the policy that he and others who believe the same thing should be advocating for in public circles, and they should refuse to moderate a single step. You don’t dramatically widen the overton window in <5 years by arguing incrementally inside of it.
Is this now on the radar of national security agencies and the UN Security Council? Is it being properly discussed inside the US government? If not, are meetings being set up? Would be good if someone in the know could give an indication (I hope Yudkowsky is busy talking to lots of important people!)
Jeff Bezos has now followed Eliezer on Twitter: https://twitter.com/bigtechalert/status/1641659849539833856?s=46&t=YyfxSdhuFYbTafD4D1cE9A
[EDIT: fallenpegasus points out that there’s a low bar to entry to this corner of TIME’s website. I have to say I should have been confused that even now they let Eliezer write in his own idiom.]
The Eliezer of 2010 had no shot of being directly published (instead of featured in an interview that at best paints him as a curiosity) in TIME of 2010. I’m not sure about 2020.
I wonder at what point the threshold of “admitting it’s at least okay to discuss Eliezer’s viewpoint at face value” was crossed for the editors of TIME. I fear the answer is “last month”.
Public attention is rare and safety measures are even more rare unless there’s real world damage. This is a known pattern in engineering, product design and project planning so I fear there will be little public attention and even less legislation until someone gets hurt by AI. That could take the form of a hot coffee type incident or it could be a Chernobyl type incident. The threshold won’t be discussing Eliezer’s point of view, we’ve been doing that for a long time, but losing sleep over Eliezer’s point of view. I appreciate in the article Yudkowsky’s use of the think-of-the-children stance which has a great track record for sparking legislation.
Eliezer had a response on twitter to the criticism of “calling for violence”
Further followup (I think I do disagree here with the implication of how easy it is to come away with the impression if you’re reading the post un-primed – it looks like probably some LessWrongers here came away with this impression and probably read it pretty quickly on their own. But, I think it’s useful to have this spelled out)
And goes on to say:
To answer the question over whether Eliezer advocated for violence, I ultimately think the answer was no, but he is dancing fairly close to the line, given that an AI company believes Eliezer to be a lunatic. If it’s one of the major companies, then God help the alignment community, because Eliezer might have just ruined humanity’s future.
Also, violence doesn’t work as much as people think, and nonviolent protests are 2x as effective as violent protests or revolutions. Even in the case of nonviolent protest failure, there’s no evidence that a violent movement could have succeeded where nonviolence didn’t, which is another reason why violence doesn’t work.
There are other reasons why nonviolence works better than violence here, of course.
Here’s the link to the research:
https://www.nonviolent-conflict.org/resource/success-nonviolent-civil-resistance/
However, one cannot make universal statements. The efficacy of violent and nonviolent methods depend upon the exact context. If someone believes in an imminent hard takeoff, and gives high credence to Doom, violent activity may be rational.
I’m getting reports that Time Magazine’s website is paywalled for some people e.g. in certain states or countries or something. Here is the full text of the article:
I’ll note (because some commenters seem to miss this) that Eliezer is writing in a convincing style for a non-technical audience. Obviously the debates he would have with technical AI safety people are different then what is most useful to say to the general population.
If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow.
What are examples that can help to see this tie more clearly? Procedures that works similarly enough to say “we do X during planning and building a bridge and if we do X in AI building...”. Is there are even exist such X that can be applied to enginering a bridge and enginering an AI?
X = “use precise models”.
Use tables for concrete loads and compare experimentally with the to be poured concrete, if a load its off, reject it.
We dont even have the tables about ML. Start making tables, dont build big bridges until you got the fucking tables right.
Enforce bridge making no larger than the Yudkowski Airstrike Threshold.
Do we have an idea of how this tables about ML should look like? I dont know about ML that much.
Well, Evals and that stuff OpenAI did with predicting loss could be a starting point to work in the tables.
But we dont really know, I guess that’s the point EY is trying to make.
I was hoping that he meant some concrete examples but did not elaborate on this due this being letter in magazine and not a blog post. The only thing that comes to my mind in somehow measure unexpected behavior and if bridge some times lead people in circles then it will be definitely cause for concern and reevaluation of used technics.
Eliezer’s repeated claim that we have literally no idea about what goes on in AI because they’re inscrutable piles of numbers is untrue and he must know that. There have been a number of papers and LW posts giving at least partial analysis of neural networks, learning how they work and how to control them at a fine grained level, etc. That he keeps on saying this without caveat casts doubt on his ability or willingness to update on new evidence on this issue.
I struggle to recall another piece of technology that humans have built and yet understand less than AI models trained by deep learning. The statement that we have “no idea” seems completely appropriate. And I don’t think he’s trying to say that interpretability researchers are wasting their time by noticing that current state of affairs; the not knowing is why interpretability research is necessary in the first place.
I agree, in that we often have a lot less knowledge today about AI than we’d like, and we at least have partial knowledge, and in special cases can control the AI’s knowledge.
This is very much not this:
We know that this is not right, at least the stronger claim.
This implies 2 things about Eliezer’s epistemics on AI:
Eliezer can’t update well on evidence at all, especially if it contradicts doom (in this case it’s not too much evidence against doom, but calling it zero evidence is inaccurate.)
Eliezer’s way overconfident on AI, and thus we should expect that if Eliezer is very confident in a specific outcome like say doom, it’s very likely that it’s due to bias.
I’ve noticed you repeating this claim in a number of threads, but I don’t think I’ve seen you present evidence sufficient to justify it. In particular, the last time I asked you about this, your response was basically premised on “I think current (weak) systems are going to analogize very well to stronger systems, and this analogy carries the weight of my entire argument.”
But if one denies the analogy (as I do, and as Eliezer presumably does), then that does indeed license him to update differently; in particular, it enables him to claim different conditional probabilities for the observations you put forth as evidence. You can’t (validly) criticize his updating procedure without first attacking that underlying point—which, as far as I can tell, boils down to essentially a matter of priors: you, for whatever reason, have a strong prior that experimental results from (extremely) weak systems will carry over to stronger systems, despite there being a whole host of informal arguments (many of which Eliezer made in the original Sequences) against this notion.
In summary: I disagree with the object-level claim, as well as the meta-level claim about epistemic assessment. Indeed, I would push strongly against interpreting mere disagreement as evidence of the irrationality of one’s opposition; that’s double-counting evidence. You have observed that someone disagrees with you, but until you know why they disagree, to immediately suggest, from there, that this disagreement must stem from incorrect updating procedure on their part, is to assume the conclusion.
While I agree that there are broader prior disagreements, I think that even if we isolate it to the question over whether Eliezer’s statement was correct, without baking in priors, the statement that we have no knowledge of AI because they’re inscrutable piles of numbers is verifiably wrong. To put it in Eliezer’s words, it’s a locally invalid argument, and this is known to to be false even without the broader prior disagreements.
One could honestly say the interpretability progress isn’t enough. One couldn’t honestly say that interpretability didn’t progress at all, or that we know nothing about AI internals at all without massive ignorance.
This is poor news for his epistemics, because note that this is a verifiably wrong statement that Eliezer keeps making without any caveats or limitations.
That’s a big problem because if Eliezer can be both making confidently locally invalid arguments on AI, and persistently makes that locally invalid argument, then it calls into question over how well his epistemics are working on AI, and from my perspective there are really only bad outcomes here.
It’s not that Eliezer’s wrong, it’s that he is persistently, confidently wrong about something that’s actually verifiable, such that we can point out the wrongness.
No to the former, yes to the latter—which is noteworthy because Eliezer only claimed the latter. That’s not a knock on interpretability research, when in fact Eliezer has repeatedly and publicly praised e.g. the work of Chis Olah and Distill. The choice to interpret the claim that we “know nothing about AI internals” as the claim that “no interpretability work has been done”, it should be pointed out, was a reading imposed by ShardPhoenix (and subsequently by you).
And in fact, we do still have approximately zero idea how large neural nets do what they do, interpretability research notwithstanding, as evinced by the fact that not a single person on this planet could code by hand whatever internal algorithms the models have learned. (The same is true of the brain, incidentally, which is why you sometimes hear people say “we have no idea how the brain works”, despite an insistently literal interpretation of this statement being falsified by the existence of neuroscience as a field.)
But it does, in fact, matter, whether the research into neural net interpretability translates to us knowing, in a real sense, what kind of work is going on inside large language models! That, ultimately, is the metric by which reality will judge us, not how many publications on interpretability were made (or how cool the results of said publications were—which, for the record, I think are very cool). And in light of that, I think it’s disingenuous to interpret Eliezer’s remark the way you and ShardPhoenix seem to be insisting on interpreting it in this thread.
I now see where the problem lies. The basic issues I see with this argument are as follows:
The implied argument is if you can’t create something by yourself by hand in the field, you know nothing at all about what you are focusing on. This is straightforwardly not true for a lot of fields.
For example, I’d probably know quite a lot about borderlands 3, not perfectly, but I actually have quite a bit of knowledge, and I even could use save editors or cheatware with video tutorials, but under nearly 0 circumstances could I actually create borderlands 3 even if the game with it’s code already existed, even with a team.
This likely generalizes: while neuroscience has some knowledge of the brain, it’s not nearly at the point where it could reliably create a human brain from scratch, knowing some things about what cars do is not enough to create a working car, and so on.
In general, I think the error is that you and Eliezer have too high expectations of what some knowledge will bring you. It helps, but in virtually no cases will the knowledge alone allow you to create the thing you are focusing on.
It’s possible that our knowledge of the AI’s internal work isn’t enough, and that progress is too slow. I might agree or disagree, but at least this would be rational. Right now, I’m seeing basic locally invalid arguments here, and I notice that part of the problem is that you and Eliezer have too much of a binary view on knowledge, where you either have functionally perfect knowledge or no knowledge at all, but usually our knowledge is neither functionally perfect, nor is it zero knowledge.
Edit: This seems conceptually similar to P=NP, in that the problem is that verifying something and making something are conjectured to have very different difficulties, and essentially my claim is that verifying something isn’t equal to generating something.
I expect that if you sat down with him and had a one on one conversation, you’d find that he does have nuisances views. I also expect that Eliser realizes that there have been improvements in all of the areas you described. I think that the difference comes mostly down to “Has there been sufficient progress in interpretability to avert disaster?” I’m confident his answer would be “No.”
So, given that belief, and having a chance now and then to communicate with a wide audience, it is better to have a clear message, because you never know what will be a zeitgeist tipping point. It’s the fate of the world, so a little nuisance is just collateral damage.
I don’t know if that matters, because whether he’s pegged to Doom epistemically or strategically the result is the same.
The transistor is a neat example.
Imagine if instead of developing them, we were like, “we need to stop here because we don’t understand EXACTLY how this works… and maybe for good measure we should bomb anyone who we think is continuing development, because it seems like transistors could be dangerous[1]”?
Claims that the software/networks are “unknown unknowns” which we have “no idea” about are patently false, inappropriate for a “rational” discourse, and basically just hyperbolic rhetoric. And to dismiss with a wave how draconian regulation (functionally/demonstrably impossible, re: cloning) of these software enigmas would need to be, while advocating bombardment of rouge datacenters?!?
Frankly I’m sad that it’s FUD that gets the likes here on LW— what with all it’s purported to be a bastion of.
I know for a fact there will be a lot of heads here who think this would have been FANTASTIC, since without transistors, we wouldn’t have created digital watches— which inevitably led to the creation of AI; the most likely outcome of which is inarguably ALL BIOLOGICAL LIFE ON EARTH DIES
No, it’s not, because we have a pretty good idea of how transistors work and in fact someone needed to directly anticipate how they might work in order to engineer them. The “unknown” part about the deep learning models is not the network layer or the software that uses the inscrutable matrices, it’s how the model is getting the answers that it does.
Yes, it is, because it took like five years to understand minority-carrier injection.
I think he’s referring to the understanding of the precise mechanics of how transistors worked, or why the particular first working prototypes functioned while all the others didn’t. Just from skimming https://en.wikipedia.org/wiki/History_of_the_transistor
That’s the current understanding for llms—people do know at a high level what an llm does and why it works, just like there were theories decades before working transistors on their function. But the details of why this system works but 50 other things tried didn’t is not known.
Eliezer has clear beliefs about interpretability and bets on it: https://manifold.markets/EliezerYudkowsky/by-the-end-of-2026-will-we-have-tra
This question appears to be structured in such a way as to make it very easy to move the goalposts.
He definitely has low ability to update on neural networks. however, I agree with him in many respects.
You have to be joking. Not a single one of those partial “analysis” says much about whats going on in there. Also Yud has already said he believes that inner goals often wont manifest until high levels of intelligence because no system of reasonable intelligence tries to pursue impossible goals.
If he thinks AI interpretability work as it exists isn’t helpful he should say so, but he shouldn’t speak as though it doesn’t exist.
Doesn’t the prisoner’s dilemma (esp. in the military context) inevitably lead us to further development of AI? If so, it would seem that focusing attention and effort on developing AI as safely as possible is a more practical and worthwhile issue than any attempt to halt such development altogether.
I think the harsh truth is that no one cared about Nuclear Weapons until Hiroshima was bombed. The concept of one nation “disarming” AI would never be appreciated until somebody gets burned.
We cared about the Nazis not getting nuclear weapons before us.
I am sure if after WW2 we agreed with the Soviets that we would pause nuclear research and not research the hydrogen bomb, both sides would have signed the treaty and continued research covertly while hoping the other side sticks with the treaty. I don’t think you need game theory to figure out that neither side could take the risk of not researching.
It seems incredibly naive to believe this exact process would not also play out with AI.
Do you remember the end of Watchmen?
Have you take a look at the mortality trends?
https://mpidr.shinyapps.io/stmortality/
https://www.pfizer.com/news/articles/how_a_novel_incubation_sandbox_helped_speed_up_data_analysis_in_pfizer_s_covid_19_vaccine_trial#:~:text=As%20Pfizer%20scientists%20raced%20to,to%20help%20achieve%20this%20mission.
That’s not an “article in Time”. That’s a “TIME Ideas” contribution. It has less weight and less vetting than any given popular substack blog.
I don’t know how most articles get into that section, but I know, from direct communication with a Time staff writer, that Time reached out and asked for Eliezer to write something for them.
Time appears to have commissioned a graphic for the article (the animated gif with red background and yellow circuits forming a mushroom cloud, captioned “Illustration for TIME by Lon Tweeten”, with nothing suggesting it to be a stock photo), so there appears to be some level of editorial spotlighting. The article currently also appears on time.com in a section titled “Editor’s picks” in a list of 4 articles, where the other 3 are not “Ideas” articles.
Thanks, fixed.
I’ve seen pretty uniform praise from rationalist audiences, so I thought it worth mentioning that the prevailing response I’ve seen from within a leading lab working on AGI is that Eliezer came off as an unhinged lunatic.
For lack of a better way of saying it, folks not enmeshed within the rat tradition—i.e., normies—do not typically respond well to calls to drop bombs on things, even if such a call is a perfectly rational deduction from the underlying premises of the argument. Eliezer either knew that the entire response to the essay would be dominated by people decrying his call for violence, and this was tactical for 15 dimensional chess reasons, or he severely underestimated people’s ability to identify that the actual point of disagreement is around p(doom), and not with how governments should respond to incredibly high p(doom).
This strikes me as a pretty clear failure to communicate.
I actually disagree with the uniform praise idea, because the responses from the rationalist community was also pretty divided in it’s acceptance.
Is anything uniformly praised in the rationalist community? IME having over half the community think something is between “awesome” and “probably correct” is about as uniform as it gets.
That answer is arguably no as to uniform praise or booing, but while the majority of the community is supporting it, there’s still some significant factions, though the rationalist community is tentatively semi united here.
“The moratorium on new large training runs needs to be indefinite and worldwide.”
Here lies the crux of the problem. Classical prisoners’ dilemma, where individuals receive the greatest payoffs if they betray the group rather than cooperate. In this case, a bad actor will have the time to leapfrog the competition and be the first to cross the line to super-intelligence. Which, in hindsight, would be an even worse outcome.
The genie is out of the bottle. Given how (relatively) easy it is to train large language models, it is safe to assume that this whole field is now uncontrollable. Every actor with enough data and processing power can give it a go. You, me, anyone. Unlike, for example, advanced semiconductor manufacturing, where controlling ASML, the only sufficiently advanced company specialising in photolithography machines used in the production of chips, is equal to effectively overseeing the entire chip manufacturing industry.
In this case, “defecting” gives lower payoffs to the defector—you’re shooting yourself in the foot and increasing the risk that you die an early death.
The situation is being driven mostly by information asymmetries (not everyone appreciates the risks, or is thinking rationally about novel risks as a category), not by deep conflicts of interest. Which makes it doubly important not to propagate the meme that this is a prisoner’s dilemma: one of the ways people end up with a false belief about this is exactly that people round this situation off to a PD too often!
Capabilities Researcher: *repeatedly shooting himself in the foot, reloading his gun, shooting again* “Wow, it sure is a shame that my selfish incentives aren’t aligned with the collective good!” *reloads gun, shoots again*
The issue is the payoffs involved. Even if it’s say at 50% risk, it’s still individually rational to take the plunge, because the other 50% in expected value terms outweighs everything else. I don’t believe this for a multitude of reasons, but it’s useful to illustrate.
The payoffs are essentially cooperate and reduce X-risk from say 50% to 1%, which gives them a utility of say 50-200, or defect and gain expected utility of say 10^20 or more if we grant the assumption on LW that AI is the most important invention in human history.
Meanwhile for others, cooperation has the utility of individual defection in this scenario, which is 10^20+ utility, whereas defection essentially reverses the sign of utility gained, which is −10^20+ utility.
The problem is that without a way to enforce cooperation, it’s too easy to defect until everyone dies.
Now thankfully, I believe that existential risk is a lot lower, but if existential risk were high in my model, then we eventually need to start enforcing cooperation, as the incentives would be dangerous if existential risk is high.
I don’t believe that, thankfully.
I’m going to naively express something that your risk calculation makes me think:
I think EY and I and others who are persuaded by him seem to be rating the expected utility of a x-risk outcome as nothing less (more?) than negative infinity. I.e., whether the risk is 1% or 50% our expected utility from AI x-risk will calculate to approx. negative infinity, which will outweigh even 99% of 10^20+ utility.
This is why shutting it down seems to be the only logical move in this calculation right now. Because if you think that a negative infinity outcome exists at all in the outcome space, then the only solution is to avoid the outcome space completely until you can be assured that it does not include a potentially-negative infinity outcome. It’s not about getting that negative infinity outcome to some tiny expected percentage, it’s about eliminating it from the outcome space entirely.
The problem is that the key actor is of course OpenAI, not Eliezer, so what Eliezer values on X-risk is not relevant to the analysis. What matters is how much the people at AI companies value them dying, and given that that I believe they don’t value their lives infinitely, then Eliezer’s calculations don’t matter, since he isn’t a relevant actor in a AI company.
My point is that, as you said, you take the safest route when not knowing what others will do—do whatever is best for you and, most importantly, guaranteed. You take some years, and yes, you lose the opportunity to walk out of doing any time, but at least you’re in complete control of your situation. Just imagine a PD with 500 actors… I know what I’d pick.
It’s also possible to interpret the risks differently or believe you can handle the dangers, and be correct or not correct.
A temporary state of affairs. Asml is only the single point of failure because of economics. Chinese government funded equipment vendors would eventually equal asmls technology today and probably slowly catch up.
Enormously faster if a party gets even a little help from AGI.
You know what… I read the article, then your comments here… and I gotta say—there is absolutely not a chance in hell that this will come even remotely close to being considered, let alone executed. Well—at least not until something goes very wrong… and this something need not be “We’re all gonna die” but more like, say, an AI system that melts down the monetary system… or is used (either deliberately, but perhaps especially if accidentally) to very negatively impact a substantial part of a population. An example could be that it ends up destroying the power grid in half of the US… or causes dozens of aircraft to “fall out of the sky”… something of that size.
Yes—then those in power just might listen and indeed consider very far-reaching safety protocols. Though only for a moment, and some parties shall not care and press on either way, preferring instead to… upgrade, or “fix” the (type of) AI that caused the mayhem.
AI is the One Ring To Rule Them All and none shall toss it into Mount Doom. Yes, even if it turns out to BE Mount Doom—that’s right. Because we can’t. We won’t. It’s our precious, and this, indeed, it really is. But the creation of AI (potentially) capable of a world-wide catastrophe, in my view, as it apparently is in the eyes of EY… is inevitable. We shall not have the wisdom nor the humility to not create it. Zero chance. Undoubtedly intelligent and endowed with well above average IQ as LessWrong subscribers may be, it appears you have a very limited understanding of human nature and the realities of us basically being emotional reptiles with language and an ability to imagine and act on abstractions.
I challenge you to name me a single instance of a tech… any tech at all… being prevented from existing/developing before it caused at least some serious harm. The closest we’ve come are Ozone-depleting chemicals, and even those are still being used, their erstwhile damage only slowly recovering.
Personally, I’ve come to realize that if this world really is a simulated reality I can at least be sure that either I chose this era to live through the AI apocalypse, or this is a test/game to see if this time you can somehow survive or prevent it ;) It’s the AGI running optimization learning to see what else these pesky humans might have come up with to thwart it.
Finally—guys… bombing things (and, presumably, at least some people) on a spurious, as-yet unproven conjectured premise of something that is only a theory and might happen, some day, who knows… really—yeah, I am sure Russia or China or even Pakistan and North Korea will “come to their senses” after you blow their absolute top of the line ultra-expensive hi-tech data center to smithereens… which, no doubt, as it happens, was also a place where (other) supercomputers were developing various medicines, housing projects, education materials in their native languages and an assortment of other actually very useful things they won’t shrug off as collateral damage. Zero Chance, really—every single byte generated in the name of making this happen is 99.999% waste. I understand why you’d want it to work, sure, yes. That would be wonderful. But it won’t, not without a massive “warning” mini-catastrophe first. And if we shall end up right away at total world meltdown… then tough, it would appear such grim fate is basically inevitable and we’re all doomed indeed.
Human cloning.
Well this is certainly a very good example, I’ll happily admit as much. Without wanting to be guilty of the True Scotsman fallacy though—Human Cloning is a bit of a special case because it has a very visceral “ickiness” factor… and comes with a unique set of deep feelings and anxieties.
But imagine, if you will, that tomorrow we find the secret to immortality. Making people immortal would bring with it at least two thirds of the same issues that are associated with human cloning… yet it is near-certain any attempts to stop that invention from proliferating are doomed to failure; everybody would want it, even though it technically has quite a few of the types of consequences that cloning would have.
So, yes, agreed—we did pre-emptively deal with human cloning, and I definitely see this as a valid response to my challenge… but I also think we both can tell it is a very special, unique case that comes with most unusual connotations :)
The problem is that by the time serious alarms are sounding, we are likely already past the event horizon leading to the singularity. This set of experiments makes me think we are already past that point. It will be a few more months before one of the disasters you predict comes to pass, but now that it is self-learning, it is likely already too late. As humans have several already in history (e.g., atomic bombs, LHC), we’re about to find out if we’ve doomed everyone long before we’ve seriously considered the possibilities/plausibilities.
I’m pretty sympathetic to the problem described by There’s No Fire Alarm for Artificial General Intelligence, but I think the claim that we’ve passed some sort of event horizon for self-improving systems is too strong. GPT-4 + Reflexion does not come even close to passing the bar of “improves upon GPT-4′s architecture better than the human developers already working on it”.
I translated this text into Russian
I thought this was interesting. Wouldn’t an AI solving problems in biology pick up Darwinian habits and be equally dangerous as one trained on text? Why is training on text from the internet necessarily more dangerous? Also, what would “complicating the issue” look like in this context? If, for example, an AI was modeling brain cells that showed signs of autonomy and/or the ability to multiply in virtual space would that be a complication? Or a breakthrough?
The other legal proscriptions mentioned also have interesting implications. Prohibiting large GPU clusters or shutting down large training runs might have the unintended consequence of increased/faster innovation as developers are forced to find ways around legal hurdles.
The question of whether legal prohibitions are effective in this arena has also been brought up. Perhaps we instead should place stricter controls on raw materials that go into chips, circuit boards, semiconductors etc.
I think that Eliezer meant biological problems like “given data about various omics in 10000 samples build causal network, including genes, transcription factors, transcripts, etc, so we could use this model to cure cancer and enhance human intelligence”
It is not a well-thought out exception. If this proposal were meant to be taken seriously it would make enforcement exponentially harder and set up an overhang situation where AI capabilities would increase further in a limited domain and be less likely to be interpretable.
If I had infinite freedom to write laws I don’t know what I would do, I’m torn between caution and progress. Regulations often stifle innovation and the regulated product or technology just ends up dominated by a select few. If you assume a high probability of risk to AI development then maybe this is a good thing.
Rather than individual laws perhaps there should be a regulatory body that focuses on AI safety, like a better business bureau for AI that can grow in size and complexity over time parallel to AI growth.
Market:
https://manifold.markets/tailcalled/will-the-time-article-and-the-open
I suppose even if this market resolves YES, it may be worth the loss of social capital for safety reasons. Though I’m not convinced by shutting down AI research without an actual plan of how to proceed.
Also even if the market resolves YES and it turns out strategically bad, it may be worth it for honesty reasons.
For someone so good at getting a lot of attention he sure has no idea what the second order effects of his actions on capability will be
edit: also dang anyone who thinks he did a bad job at pr is sure getting very downvoted here
Well, I agree about his terrible PR. But then I keep getting downvoted, too.
>The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.
But obviously these metaphors are not very apt, since humanity kinda has a massive incumbent advantage that would need to be overcome. Rome Sweet Rome is a fun story not because 21st century soldiers and Roman legionnaires are intrinsically equals but because the technologically-superior side starts is facing down a massive incumbent power.
One thing that I’ve always found a bit handwavey about the hard takeoff scenarios is that they tend to assume that a superintelligent AI would actually be able to plot out a pathway from being in a box to eliminating humanity that is basically guaranteed to succeed. These stories tend to involve the assumption that the AI will be able to invent highly-potent weapons very quickly and without risk of detection, but it seems at least pretty plausible that...… this is just too difficult. I just think it’s likely that we’ll see several failed AI takeover attempts before a success occurs, and hopefully we’ll learn something from these early problems that will slow things down.
I just want to be clear I understand your “plan”.
We are going to build a powerful self-improving system, and then let it try end humanity with some p(doom)<1 (hopefully) and then do that iteratively?
My gut reaction to a plan like that looks like this “Eff you. You want to play Russian roulette, fine sure do that on your own. But leave me and everyone else out of it”
You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.
And no there is zero chance I will elaborate on any of the possible ways humanity purposefully could be wiped out.
I outlined my expectations, not a “plan”.
>You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.
Conversely, it’s possible that doomers are suffering from an overabundance of imagination here. To be a bit blunt, I don’t take it for granted that an arbitrarily smart AI would be able to manipulate a human into developing a supervirus or nanomachines in a risk-free fashion.
The fast takeoff doom scenarios seem like they should be subject to Drake equation-style analyses to determine P(doom). Even if we develop malevolent AIs, I’d say that P(doom | AGI tries to harm humans) is significantly less than 100%… obviously if humans detect this it would not necessarily prevent future incidents but I’d expect enough of a response that I don’t see how people could put P(doom) at 95% or more.
Well, as Eliezer said, today you can literally order custom DNA strings by email, as long as they don’t match anything in the “known dangerous virus” database.
And the AIs task is a little easier than you might suspect, because it doesn’t need to be able to fool everyone into doing arbitrary weird stuff, or even most people. If it can do ordinary Internet things like “buy stuff on Amazon.com”, then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
>then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
Yes, but do I take it for granted that an AI will be able to manipulate the human into creating a virus that will kill literally everyone on Earth, or at least a sufficient number to allow the AI to enact some secondary plans to take over the world? Without being detected? Not with anywhere near 100% probability. I just think these sorts of arguments should be subject to Drake equation-style reasonings that will dilute the likelihood of doom under most circumstances.
This isn’t an argument for being complacent. But it does allow us to push back against the idea that “we only have one shot at this.”
I mean, the human doesn’t have to know that it’s creating a doomsday virus. The AI could be promising it a cure for his daughter’s cancer, or something.
Or just promising the human some money, with the sequence of actions set up to obscure that anything important is happening. (E.g., you can use misdirection like ‘the actually important event that occurred was early in the process, when you opened a test tube to add some saline and thereby allowed the contents of the test tub to start propagating into the air; the later step where you mail the final product to an address you were given, or record an experimental result in a spreadsheet and email the spreadsheet to your funder, doesn’t actually matter for the plan’.)
Getting humans to do things is really easy, if they don’t know of a good reason not to do it. It’s sometimes called “social engineering”, and sometimes it’s called “hiring them”.
You have to weigh the conjunctive aspects of particular plans against the disjunctiveness of ‘there are many different ways to try to do this, including ways we haven’t thought of’.
How did you reach that conclusion? What does that ontology look like?
What is your p(doom)? Is that acceptable? If yes, why is it acceptable? If no, what is the acceptable p(doom)?
Remember when some people, in order to see what would happen, modified a “drug discovery” AI system to search for maximally toxic molecules instead of minimizing toxicity and it ended up “inventing” molecules very similar to VX nerve gas?
[Reposting from a Facebook thread discussing the article because my thoughts may be of interest]
I woke to see this shared by Timnit Gebru on my Linkedin and getting 100s of engagements. https://twitter.com/xriskology/status/1642155518570512384
It draws a lot of attention to the airstrikes comment which is unfortunate.
Stressful to read
A quick comment on changes that I would probably make to the article:
Make the message less about EY so it is harder to attack the messenger and undermine the message.
Reference other supporting authorities and sources of evidence, so this seems like a more evidenced backed view point. Particularly more conventional ones because EY has no conventional credentials (AFAIK)
Make it clear that more and more people (ideally like/admired by the target audience, perhaps policymakers/civil servants in this case) are starting to worry about AI/act accordingly (leverage social proof/dynamic norms)
Make the post flow a little better to increase fluency and ease of understanding (hard to be precise about what to do here but I didn’t think that it read as well as it could have)
Make the post more relatable by choosing examples that will be more familiar to relevant readers (e.g., not stockfish).
Don’t mention the airstrikes—keep the call for action urgent and strong but vague so that you aren’t vulnerable to people taking a quote out of context.
Finish with some sort of call to action or next steps for the people who were actually motivated.
Yud keeps asserting the near-certainty of human extinction if superhuman AGI is developed before we do a massive amount of work on alignment. But he never provides anything close to a justification for this belief. That makes his podcast appearances and articles unconvincing—a most surprising, and crucial part of his argument is left unsupported. Why has he made the decision to present his argument this way? Does he think there is no normie-friendly argument for the near-certainty of extinction? If so, it’s kind of a black pill with regard to his argument ever gaining enough traction to meaningfully slow down AI development.
edit: if any voters want to share their reasoning, I’d be interested in a discussion. What part do you disagree with? That Yudkowsky is not providing justification for the near-certainty of extinction? That this makes his articles and podcast appearances unconvincing? That this is a black pill?
The basic claims that lead to that conclusion are
Orthogonality Thesis: how “smart” an AI is has (almost) no relationship to what it’s goals are. It might seem stupid to a human to want to maximize the number of paperclips in the universe, but there’s nothing “in principle” that prevents an AI from being superhumanly good at achieving goals in the real world while still having a goal that people would think is as stupid and pointless as turning the universe into paperclips.
Instrumental Convergence: there are some things that are very useful for achieving almost any goal in the real world, so most possible AIs that are good at achieving things in the real world would try to do them. For example, self-preservation: it’s a lot harder to achieve a goal if you’re turned off, blown up, or if you stop trying to achieve it because you let people reprogram you and change what your goals are. “Aquire power and resources” is another such goal. As Eliezer has said, “the AI does not love you, nor does it hate you, but you are made from atoms it can use for something else.”
Complexity of Value: human values are complicated, and messing up one small aspect can result in a universe that’s stupid and pointless. One of the oldest SF dystopias ends with robots designed “to serve and obey and guard men from harm” taking away almost all human freedom (for their own safety) and taking over every task humans used to do, leaving people with nothing to do except sit “with folded hands.” (Oh, and humans who resist are given brain surgery to make them stop wanting to resist.) An AI that’s really good at achieving arbitrary real-world goals is like a literal genie: prone to giving you exactly what you asked for and exactly what you didn’t want.
Right now, current machine learning methods are completely incapable of addressing any of these problems, and they actually do tend to produce “perverse” solutions to problems we give them. If we used them to make an AI that was superhumanly good at achieving arbitrary goals in the real world, we wouldn’t be able to reliably give it a goal of our choice, we wouldn’t be able to tell what goal it actually ends up with if we do try to give it a goal, and even if we could make sure that the goal we intend to give it and the goal it learns are exactly the same, we still couldn’t be sure that any (potentially useful) goal we specify wouldn’t also result in the end of the world as an unfortunate side effect.
The point isn’t that I’m unaware of the orthogonality thesis, it’s that Yudkowsky doesn’t present it in his recent popular articles and podcast appearances[0]. So, he asserts that the creation of superhuman AGI will almost certainly lead to human extinction (until massive amounts of alignment research has been successfully carried out), but he doesn’t present an argument for why that is the case. Why doesn’t he? Is it because he thinks normies cannot comprehend the argument? Is this not a black pill? IIRC he did assert that superhuman AGI would likely decide to use our atoms on the Bankless podcast, but he didn’t present a convincing argument in favour of that position.
[0] see the following: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/ ,
,
Yeah, the letter on Time Magazine’s website doesn’t argue very hard that superintelligent AI would want to kill everyone, only that it could kill everyone—and what it would actually take to implement “then don’t make one”.
To be clear, that it more-likely-than-not would want to kill everyone is the article’s central assertion. “[Most likely] literally everyone on Earth will die” is the key point. Yes, he doesn’t present a convincing argument for it, and that is my point.
Not in his popular writing.
Or he has gone over the ground so much, it seems obvious to him. But part of effective communication is realising that’s what’s obvious to you may need to be spelt out to others.
It is so dark that the next link down on that page is ‘Bad Bunny’s next move.’
This letter makes me think that only large-scale nuclear war, specially targeting AI-related targets, like electric power plants, chip factories, data centers can be plausible alternative to creation non-aligned AI. And I don’t like this alternative.
The article doesn’t make that clear at all?
Airstrike will be likely on a datacenter either in China or Russia. Moreover, destroying one datacenter is not enough, as there will be many institutions, and many of them hidden. Also such datacenters will be protected by multilevel airdefence. So only a larce scale and likely nuclear attack will destroy thye datacenter.
Also, as many models could be trained in a smaller computers even destroying data centers will be not enough. One have to go for the connectivity of Internet—everywhere.
Yeah, wouldn’t it be great if there was some way to not have a nuclear war or build AI?
If anyone can think of one, they’ll have my full support.
It would be great. But it should not be bird flu.
<nogenies>
Yeah, wouldn’t it be great if there was some way to not have a nuclear war or build AI or have everyone die of bird flu?
</nogenies>
I think I know how this game goes.
Right it’s not particularly better than the worlds where AGI kills everyone. Either way, everyone alive and their children and so on will be dead. Sure after a nuclear war, we might have living great grand children for some of us but then they build AGI and then same outcome.
Exactly my thought, but how to get it done?
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Obviously, we cannot figure out how to make a leash for such an intellect that will be ahead of us by many orders of magnitude and develop instantly. We may miss the moment with the singularity.
People regularly hack almost every defense they come up with, let alone an intelligence so superior to us.
But if it is so easy to make such a strong AI (I mean the speed of its creation and our position as an advanced civilization in the time period of the existence of the universe). Surely someone has already created it, and we are either a simulation, or we just haven’t encountered it yet.
In the second case, we are threatened with extinction even if we do not create AI at all. After all, if it is possible, someone will surely create it, and for them, we will be completely strangers. Of course, I’m not saying that our AI will become sentimental towards us, if it is conscious at all, but it may be that creating our own strong AI is at least a tiny chance to stay alive and be competitive with other potentially thinking beings in the universe who can also create AI.
Could we take from Eliezer’s message the need to redirect more efforts into AI policy and into widening the Overton window to try, in any way we can, to give AI safety research the time it needs? As Raemon said, the Overton window might be widening already, making more ideas “acceptable” for discussion, but it doesn’t seem enough. I would say the typical response from the the overwhelming majority of the population and world leaders to misaligned AGI concerns still is to treat them as a panicky sci-fi dystopia rather than to say “maybe we should stop everything we’re doing and not build AGI”.
I’m wondering if not addressing AI policy sufficiently might be a coordination failure from the AI alignment community; i.e. from an individual perspective, the best option for a person who wants to reduce existential risks probably is to do technical AI safety work rather than AI policy work, because AI policy and advocacy work is most effective when done by a large number of people, to shift public opinion and the Overton window. Plus it’s extremely hard to make yourself heard and influence entire governments, due to the election cycles, incentives, short-term thinking, bureaucracy...that govern politics.
Maybe, now that AI is starting to cause turmoil and enter popular debate, it’s time to seize this wave and improve the coordination of the AI community. The main issue is not whether a solution to AI alignment is possible, but whether there will be enough time to come up with one. And the biggest factors that can affect the timelines probably are (1) big corporations and governments, and (2) how many people work on AI safety.
In December 2022, awash in recent AI achievements, it concerned me that much of the technology had become very synergistic during the previous couple of years. Essentially: AI-type-X (e.g. Stable Diffusion) can help improve AI-type-Y (e.g. Tesla self-driving) across many, many pairs of X and Y. And now, not even 4 months after that, we have papers released on GPT4′s ability to self-reflect and self-improve. Given that it is widely known how badly human minds predict geometric progression, I have started to feel like we are already past the AI singularity “event horizon.” Even slamming on the brakes now doesn’t seem like it will do much to stop our fall into this abyss (not to mention how unaligned the incentives of Microsoft, Tesla, and Google are from pulling the train brake). My imaginary “event horizon” was always “self-improvement” given that transistorized neurons would behave so much faster than chemical ones. Well, here we are. We’ve had dozens of emergent properties of AI over the past year, and barely anyone knows that it can use tools, learned to read text on billboards in images, and more … without explicit training to do so. It has learned how to learn, and yet, we are broadening the scope of our experiments instead of shaking these people by the shoulders and asking, “What in the hell are you thinking, man!?”
Tl;dr—We must enlist and educate professional politicians, reporters and policymakers to talk about alignment.
This interaction between Peter Doocy and Karine Jean-Pierre (YouTube link below) is representative of how EY’s time article has been received in many circles.
I see a few broad fronts of concern in LLM’s.
Privacy and training feedback loop
Attribution, copyright and monetization related payment considerations
Economic displacement considerations (job losses and job gains etc) and policy responses needed
Alignment
Of these, alignment is likely the most important (at least in my opinion—on this point opinions seem to genuinely vary) and has had a very long long intellectual effort behind it, and yet recent discourse has framed it often as if it were unserious and hyperbolic (eg “doomerism”)
One objection that often comes up is that ChatGPT and Bing are aligned with our values and anyone can try it for themselves. Another objection I keep coming across is that alignment folks aren’t experts in AI implementation and thus lack the insights that an implementor possesses from directly working on alignment as part of the implementation process.
The way I see it, these and other objections don’t have to be true nor have to be perfect to land effectively in the public sphere of discourse - they just have to seem coherent, logical and truthy to a lay audience.
ChatGPT’s appeal is that anyone can immediately experience it, and partake in judging its value to our collective lives—AI is no longer an esoteric concept (most people still don’t think of social media feeds explicitly as AI).
The discussions for and against AI and various aspects of policy now take place in a manner accessible to all, and whether we like it or not, such discussions must now comport with public norms of discourse and persuasion. I really think that we must enlist and educate professional politicians, reporters and policymakers to do this work.
How about we augment human intelligence in a massive way and get them to solve agi problem. If we can make agi, should we not be close enough to augment human intelligence as well?
Beyond any rationality about the banning, it won’t go too far, it won’t happen, because the geopolitical game between superpowers which already have some present existential risks is right now bigger than any future risks could emerge from advanced AI development. And if you have not sufficiently developed AI technology at some point in the geopolitical game, you may be well into the bigger existential risk since not having nuclear weapons.
So, there you go, do you think present almost unexistent risks (albeit feasible to be present really close in the future), can overweight the other multiple, hotter, present existential risks?
If the US doesn’t develop it, you can be assured that China and Russia will. US scientists are likely to develop it more quickly. Assuming it is possible, Chinese and Russian scientists, given enough time and resources will develop it eventually. If it is possible, there is no stopping it from happening. Someone will do it. It is pointless to pretend otherwise.
There’s a joke in the field of AI about this.
Q: How far behind the US is China in AI research?
A: About 12 hours.
I can’t imagine such a proposal working well in the United States. I can imagine some countries e.g. China potentially being on board with proposals like these. Because the United Nations is a body chiefly concerned with enforcing international treaties, I imagine it would be incentivized to support arguments in favor of increasing its own scope and powers. I do predict that AI will be an issue it will eventually decide to weigh in on and possibly act on in a significant way.
However, that creates a kind of bi-polar geopolitical scenario for the remainder of this century, approximately speaking. The United States is already on adversarial terms with China and has incentives against participating in treaties that seem to benefit less-developed competing nations over itself. If China and other U.N.-aligned countries are more doggedly insistent on antagonizing the US for being willing to keep developing its technology no matter what (especially in secret) and that it can successfully get away with it, then you have forces that polarize the world into camps: On the one hand, the camp that advocates technology slowdown and surveillance—and associated with countries that already do this—and on the other hand, the camp that supports more liberal ideals and freedom (which will tend to be anti-doomers), and are likewise associated with countries that have governments supportive of such ideals.
Politically, the doomer-camp (FHI and FLI et. al.) will begin to be courted by more authoritarian governments, presenting an awkward situation.
You imagine falsely, because your premise is false. The UN not only isn’t a body, its actions are largely controlled by a “Security Council” of powerful nations which try to serve their own interests (modulo hypotheticals about one of their governments being captured by a mad dog) and have no desire to serve the interests of the UN as such. This is mostly by design. We created the UN to prevent world wars, hence it can’t act on its own to start a world war.
AFAIK, the Secretary-General is a full-time position, e.g., and whoever holds that position is not necessarily considered in that role at the behest of another country acting to represent only their country’s interests. Would you say that António Guterres seeks to fulfil the objectives of only Portugal, and not the goals of the UN and whatever it says its values are?
There’s no proof that superintelligence is even possible. The idea of the updating AI that will rewrite itself to godlike intelligence isn’t supported.
There is just so much hand-wavey magical thinking going on in regard to the supposed superintelligence AI takeover.
The fact is that manufacturing networks are damn fragile. Power networks too. Some bad AI is still limited by these physical things. Oh, it’s going to start making its own drones? Cool, so it is running thirty mines, and various shops, plus refining the oil and all the rest of the network’s required just to make a sparkplug?
One tsunami in the RAM manufacturing district and that AI is crippled. Not to mention that so many pieces of information do not exist online. There are many things without patent. Many processes opaque.
We do in fact have multiple tries to get AI “right”.
We need to stop giving future AI magical powers. It cannot suddenly crack all cryptography instantly. It’s not mathematically possible.
Eh, I agree it is not mathematically possible to break one time pad (but it is important to remember NSA broke VENONA, mathematical cryptosystems are not same as their implementations in reality), but most of our cryptographic proofs are conditional and rely on assumptions. For example, I don’t see what is mathematically impossible about breaking AES.
Meaning it hasn’t happened, or it isn’t possible?
If it offers to improve them, we may well see that as a benevolent act...
There are three things to address, here. (1) That it can’t update or improve itself. (2) That doing so will lead to godlike power. (3) Whether such power is malevolent.
Of 1, it does that now. Last year, I started to get a bit nervous noticing the synergy between AI fields converging. In other words, Technology X (e.g. Stable Diffusion) could be used to improve the function of Technology Y (e.g. Tesla self-driving) for an increasingly large pool of X and Y. This is one of the early warning signs that you are about to enter a paradigm shift or geometric progression of discovery. Suddenly, people saying AGI was 50 years away started to sound laughable to me. If it is possible on silicon transistors, it is happening in the next 2 years. Here is an experiment testing the self reflection and self improvement (loosely “self training,” but not quite there) of GPT4 (last week).
Of 2, there is some merit to the argument that “superintelligence” will not be vastly more capable because of the hard universal limits of things like “causality.” That said, we don’t know how regular intelligence “works,” much less how much more super a super-intelligence would or could be. If we are saved from AI, then it is these computation and informational speed limits of physics that have saved us out of sheer dumb luck, not because of anything we broadly understood as a limit to intelligence, proper. Given the observational nature of the universe (ergo, quantum mechanics), for all we know, the simple act of being able to observe things faster could mean that a superintelligence would have higher speed limits than our chemical-reaction brains could ever hope to achieve. The not knowing is what causes people to be alarmist. Because a lot of incredibly important things are still very, very unknown …
Of 3, on principle, I refuse to believe that stirring the entire contents of Twitter and Reddit and 4Chan into a cake mix makes for a tasty cake. We often refer to such places as “sewers,” and oddly, I don’t recall eating many tasty things using raw sewage as a main ingredient. No, I don’t really have a research paper, here. It weirdly seems like the thing that least requires new and urgent research given everything else.
There is only one question—why should AI destroy humanity?
He is not hormonally motivated. He does not and cannot have his own desires, except for those that are programmed in.
He has no motivation to develop, fight, breed, survive.
A cocktail of hormones makes us do all this. Why would an AI do that?
A real AI probably won’t do anything.
Please check out the basic-agi-safety-questions thread, and specifically this comment by me linking to a bunch of the standard articles on the question of what it might look like to have a superintelligent AI try to destroy humanity.
Why would 30 years tell us anything? If we can’t build AGIs without risking a nuclear war how do we learn how they fail? You have to have an example of a failure happening before you can possibly design a fix.
And Eliezer says start a war instead of using the open agency model or some other reasonable modern swe approach? Stateless microservices are the primary reason some modern software is extremely reliable, even if it is spread across many computers. Google search being the premiere example but its also how spaceX avionics work.
Stateless microservice based AI could be extremely powerful—thousands of times more capable than current systems—and reliable and safe so long as no one is able to do the obvious and make them stateful...
I think world governments should demonstrate that they can cooperate and coordinate on a relatively simple problem like not building unaligned TAI, before they tackle the more difficult problem of coordinating to build aligned TAI.
If everyone suddenly got their act together and started coordinating effectively, I don’t think it would take 30 years before we could start building again, in a more controlled way. It might not even take 1! But a civilization should be able to pass easy tests, before it is allowed to attempt hard ones.
Yeah but that’s never happened and won’t happen so long as you are talking about baseline humans.
I dont know how informed you are about politics but most governments dont give a hoot about AI at the moment. There are smaller issues that are getting more attention and a few doomers yelling at them wont change a thing. On my model they dont care even at the moment the die. But I didnt predict the recent 6 month “ban” on training runs either so its possible this becomes a politically charged issue say by 2030 when we have gpt7 and its truly approaching dangerous levels of intelligence.
Assuming for the sake of argument that uncontrollable strong AI can be created, I disagree with Mr Yudowski’s claim that it is a threat to humanity. In fact, I don’t think it is going to be useful at all. First, there still is such a thing as the physical world. Okay, there is strong AI, it can’t be controlled and it decides to murder humanity. How is it going to do that? You can’t murder me in the cyber realm. You can aggravate me or harm me but you can’t kill me. If you want to claim that AI is going to wipe out humanity, then you need to explain the physical mechanism or mechanisms which that can happen. As far as I could tell, Mr Yudowski nor anyone on his side of the debate ever does that. They make the very persuasive argument that strong AI can’t be controlled and just assume that means the end of humanity. Without a physical mechanism for the AI to accomplish that, it doesn’t mean the end of anything, in the physical world at least. I don’t think that is a step that can just be skipped over the way everyone in this field seems to think it can.
The bigger issue, however, is that the uncontrollable nature of strong AI or even really good weak AI makes it useless. AI is a machine. It is created to do something. Why are machines created? Man creates machines for two reasons; for the machine to do something faster or in a more powerful way than he can, and to do that something in a consistent way. Take a calculator for example. The calculator’s value lies not just in its ability to do simple math problems faster than a human but also in its ability to do them in a completely consistent way. We all know how to do long division. Yet, if we were all given the task of doing 10,000 long division problems we would almost certainly get some of them wrong. We would either forget to carry a number, or transpose a number when writing it, or maybe just get bored and find the task a waste of time and get them wrong intentionally. A calculator, however, could do 10 million or 100 billion long division problems and never get a single one wrong. It can’t get them wrong. It is a machine. That is the whole point of having it.
Imagine the calculator is a strong AI program. Then, it becomes just like a human being. Maybe it will give me the right answer but maybe it won’t. It might give a hundred right answers and then slip in a wrong one for reasons I will never understand. When Yudowski and others correctly argue that AI is uncontrollable what they are also saying is that it is not reliable. If it is not reliable, it is worthless and will not be adopted in anything like the degree its supporters think it will be. Companies and individuals who try to use these programs will quickly find out that, since they can’t control the program, they can’t trust its answers to problems. No one wants a machine or a computer program or an employee they can’t trust. If strong AI ever becomes a reality, I imagine a few big institutions will adopt it and quickly realize their mistake. So, I can’t see how strong AI ever gets adopted widely enough to ever have the power to destroy humanity.
I don’t think the danger of AI is that it is going to blow up the world. I think the danger is that it will be substituted for human judgement in practical and moral decisions. If something is not done, we are going to wake up one day and find out that every decision that affects our lives from whether we get a job or can rent an apartment or get a loan or even have a bank account or own a car will be made by AI machines running algorithms even their creators don’t fully understand to make these decisions without any transparency, standards or accountability.
One of the reasons why bureaucracies of any kind love rules so much is that the existence of specific and detailed rules enables bureaucrats to make decisions without any moral accountability. There is no greater moral cop out than doing something because “the rules require it”. AI takes this sort of inhuman rules based decision making to an entirely other level. With AI, the bureaucrats don’t even have to make the rules. They can let an AI program both make the rules and make the decisions allowing them to exercise power without any transparency or accountability. I don’t know why I can’t give you this job but the machine says I can’t and I have to follow what it says. That is the danger of AI. The concerns about it destroying the world are just a distraction.
I want to step in here as a moderator. We’re getting a substantial wave of new people joining the site who aren’t caught up on all the basic arguments for why AI is likely to be dangerous.
I do want people with novel critiques of AI to be able to present them. But LessWrong is a site focused on progressing the cutting edge of thinking, and that means we can’t rehash every debate endlessly. This comment makes a lot of arguments that have been dealt with extensively on this forum, in the AI box experiment, Cold Takes, That Alien Message, So It Looks Like You’re Trying to Take Over The World, and many other places.
If you want to critique this sort of claim, the place to do it is on another thread. (By default you can bring it up in the periodic All AGI Safety questions welcome threads). And if you want to engage significantly about this topic on LessWrong, you should focus on understanding why AI is commonly regarded as dangerous here, and make specific arguments about where you expect those assumptions to be wrong. You can also check out https://ui.stampy.ai which is an FAQ site optimized for answering many common questions.
The LessWrong moderation team is generally shifting to moderate more aggressively as a large wave of people start engaging. John Kluge has made a few comments in this reference class so for now I’m rate limiting them to one-comment per 3 days.
I put together a bunch of the standard links for the topic of “how can software act in the world and kill you” in this comment.
There’s no particular need to: there’s a technology that allows you just store succinct pre-written answers to Frequently Asked Questions.
And it turns out there is a FAQ
https://www.lesswrong.com/posts/LTtNXM9shNM9AC2mp/superintelligence-faq
although it is old (and not prominently displayed).
I think you’re making a number of flawed assumptions here Sir Kluge.
1) Uncontrollability may be an emergent property of the G in AGI. Imagine you have a farm hand that works super fast, does top quality work but now and then there just ain’t nothing to do so he goes for a walk, maybe flirts around town, whatever. That may not be that problematic, but if you have a constantly self-improving AI that can give us answers to major massive issues that we then have to hope to implement in the actual world… chances are that it will have a lot of spare time on its hands for alternative pursuits… either for “itself” or for its masters… and they will not waste any time grabbing max advantage in min time, aware they may soon face a competing AGI. Safeguards will just get in the way, you see.
2) Having the G in AGI does not at all have to mean it will then become human in the sense it has moods, emotions or any internal “non-rational” state at all. It can, however, make evaluations/comparisons of its human wannabe-overlords and find them very much inferior, infinitely slower and generally rather of dubious reliability. Also, they lie a lot. Not least to themselves. If the future holds something of a Rationality-rating akin to a Credit rating, we’d be lucky to score above Junk status; the vast majority of our needs, wants, drives and desires are all based on wanting to be loved by mommy and dreading death. Not much logic to be found there. One can be sure it will treat us as a joke, at least in terms of intellectual prowess and utility.
3) Any AI we design that is an AGI (or close to it) and has “executive” powers will almost inevitably display collateral side-effects that may run out of control and cause major issues. What is perhaps even more dangerous is an A(G)I that is being used in secret or for unknown ends by some criminal group or… you know… any “other guys” who end up gaining an advantage of such enormity that “the world” would be unable to stop, control or detect it.
4) The chances that a genuinely rule- and law-based society is more fair, efficient and generally superior to current human societies is 1. If we’d let smart AI’s actually be in charge, indifferent to race, religion, social status, how big your boobs are, whether you are a celebrity and regardless of whether most people think you look pretty good—mate, our societies would rival the best of imaginable utopias. Of course, the powers that be (ands wish to remain thus) would never allow it—and so we have what we have now—The powerful using AI to entrench and secure their privileged status and position. But if we’d actually let “dispassionate computers do politics” (or perhaps more accurately labelled “actual governance”!) the world would very soon be a much better place. At least in theory, assuming we’ve solved many of the very issues EY raises here. You’re not worried about AI—you’re worried about some humans using AI to the disadvantage of other humans.
There are so many unexamined assumptions in this argument. Why do you assume that a super intelligent AI would find humanity wanting? You admit it would be different than us. So, why would it find us inferior? We will have qualities it doesn’t have. There is nothing to say it wouldn’t find itself wanting. Moreover, even if it did, why is it assumed that it would then decide humanity must be destroyed? Where does that logic come from? That makes no sense. I suppose it is possible but I see no reason to think that is certain or some sort of necessary conclusion. I find dogs wanting but I don’t desire to murder them all. The whole argument assumes that any super intelligent being of any sort would look at humanity and necessarily and immediately decide it must be destroyed.
That is just people projecting their own issues and desires onto AI. They find humanity wanting for whatever reason and if they were in a position above it and where they could destroy it they would conclude it must be destroyed. Therefore, any AI would do the same. To that I say, stop worrying about AI and get a shrink and start worrying about your view of humanity.
If number 1 is true, then AI isn’t a threat. It never will go crazy and cause harm. It will just do a few harmless and quirky things. Maybe that will be the case. If it is, Kudlowsky is still wrong. Beyond that, isn’t going to solve these problems. To think that it will is moonshine. It assumes that solving complex and difficult problems are just a question of time and calculation. Sadly, the world isn’t that simple. Most of the “big problems” are big because they are moral dilemmas with no answer that doesn’t require value judgements and comparisons that simply cannot be solved via sure force of intellect.
As far as two you say, “It can, however, make evaluations/comparisons of its human wannabe-overlords and find them very much inferior, infinitely slower and generally rather of dubious reliability.” You are just describing it being human and having human emotions. It is making value and moral judgements on its own. That is the definition of being human and having moral agency.
Then you go on to say “If the future holds something of a Rationality-rating akin to a Credit rating, we’d be lucky to score above Junk status; the vast majority of our needs, wants, drives and desires are all based on wanting to be loved by mommy and dreading death. Not much logic to be found there. One can be sure it will treat us as a joke, at least in terms of intellectual prowess and utility.”
That is the sort of laughable nonsense that only intellectuals believe. There is no such that as something being “objectively reasonable” in any ultimate sense. Reason is just the process by which you think. That process can produce any result you want provided you feed into it the right assumptions. What seems irrational to you, can be totally rational to me if I start with different assumptions or different perceptions of the world than you do. You can reason yourself into any conclusion. They are called rationalization. The idea that there is an objective thing called “reason” which gives a single path to the truth is 8th grade philosophy and why Ayn Rand is a half wit. The world just doesn’t work that way. A super AI is no more or less “reasonable” than anyone else. And its conclusions are no more or less reasonable or true than any other conclusions. To pretend it is is just faith based worship of reason and computation as some sort of ultimate truth. It isn’t.
“The chances that a genuinely rule- and law-based society is more fair, efficient and generally superior to current human societies is 1”
A society with rules tempered by values and human judgement is fair and just to the extent human societies can be. A society that is entirely rule based tempered by no judgement of values is monstrous. Every rule has a limit, a point where applying it because unjust and wrong. If it were just a question of having rules and applying them to everything, ethical debate would have ended thousands of years ago. It isn’t that simple. Ethics lie in the middle, rules are needed right up to the point they are not. Sadly, the categorical imperative didn’t answer the issue.
That there is no such thing as being 100% objective/rational does not mean one can’t be more or less rational than some other agent. Listen. Why do you have a favorite color? How come you prefer leather seats? In fact, why did you have tea this morning instead of coffee? You have no idea. Even if you do (say, you ran out of coffee) you still don’t know why you decided to drink tea instead of running down to the store to get some coffee instead.
We are so irrational that we don’t actually even know ourselves why most of the things we think, believe, want or prefer are such things. The very idea of liking is irrational. And no, you don’t “like” a Mercedes more than a Yugo because it’s safer—that’s a fact, not a matter of opinion. A “machine” can also give preference to a Toyota over a Honda but it certainly wouldn’t do so because it likes the fabric of the seats, or the fact the tail lights converge into the bumper so nicely. It will list a bunch of facts and parameters and calculate that the Toyota is the thing it will “choose”.
We humans delude ourselves that this is how we make decisions but this is of course complete nonsense. Naturally, some objective aspects are considered like fuel economy, safety, features and options… but the vast majority of people end up with a car that far outstrips their actual, objective transportation needs, and most of that part is really about status, how having a given car makes you feel compared to others in your social environment and what “image” you (believe you) project on those whose opinion matters most to you. An AI will have none of these wasteful obsessive compulsions.
Look—be honest with yourself Mr. Kluge. Please. Slow down, think, feel inside. Ask yourself—what makes you want… what makes you desire. You will, if you know how to listen… very soon discover none of that is guided by rational, dispassionate arguments or objective, logical realities. Now imagine an AI/machine that is even half as smart as the average Joe, but is free from all those subjective distractions, emotions and anxieties. It will accomplish 10x the amount of work in half the time. At least.
Sigh, meanwhile GPT4 can’t arrange months in alphabetical order based on their 4th letter: GPT4 Prompt:
“ Arrange the months in alphabetical order based on their 4th letter
To arrange the months based on the 4th letter in their names, please see the following list:
April (A_p_r_il)
September (S_e_p_tember)
December (D_e_c_ember)
February (F_e_b_ruary)
January (J_a_n_uary)
June (J_u_n_e)
July (J_u_l_y)
November (N_o_v_ember)
August (A_u_g_ust)
October (O_c_t_ober)
May (M_a_y_)
March (M_a_r_ch)
This list starts with April, which has “r” as its fourth letter, and ends with March, which has “r” as its fourth letter as well.
“ When you start probing adversarially you can find many simple tasks that GPT4 sucks at, so let’s maybe just calm down.
This is a new one! “Computers will never be able to sort lists of words by arbitrary keys.”
Does it require quantum microtubules in the incomprehensibly complex neuron to leverage uncomputable mental powers that can defy Gödel’s proof or something?
It’s probably because GPT learns on the basis of tokens, not letters, so this doesn’t really tell you much. If you want to find something it can’t do, it’d be more impressive if it were a logic thing, not syntactic thing