I think more importantly, it simply isn’t logical to allow yourself to be Pascal Mugged, because in the absence of evidence, it’s entirely possible that going along with it would actually produce just as much anti-reward as it might gain you. It rather boggles me that this line of reasoning has been taken so seriously.
Meme Marine
Kudos to you for actually trying to solve the problem, but I must remind you that the history of symbolic AI is pretty much nothing but failure after failure; what do you intend to do differently, and how do you intend to overcome the challenges that halted progress in this area for the past ~40 years?
Yes, I agree that the US military is one example of a particularly well-aligned institution. I think my point about the alignment problem being analogous to military coup risk is still valid and that similar principles could be used to explore the AI alignment problem; military members control weaponry that no civil agency can match or defeat, in most countries.
All military organizations are structured around the principal of its leaders being able to give orders to people subservient to them. War is a massive coordination problem and being able to get soldiers to do what you want is the primary one among them. I mean to say that high ranking generals could issue such a coup, not that every service member would spontaneously decide to perform one. This can and does happen, so I think your blanket statement on the impossibility of Juntas is void.
I am unsurprised but disappointed to read the same Catastrophe arguments rehashed here, based on an outdated Bostromian paradigm of AGI. This is the main section I disagree with.
The underlying principle beneath these hypothetical scenarios is grounded in what we can observe around us: powerful entities control weaker ones, and weaker ones can fight back only to the degree that the more powerful entity isn’t all that powerful after all.
I do not think this is obvious or true at all. Nation-States are often controlled by a small group of people or even a single person, no different physiologically to any other human being. If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government; in fact, military coups are a commonplace global event. Yet, generally, most countries do not suffer constant coup attempts. We hold far fewer tools to “align” military leaders than we do AI models—we cannot control how generals were raised as children, cannot read their minds, cannot edit their minds.
I think you could also make a similar argument that big things control little things—with much more momentum and potential energy, we observe that large objects are dominant over small objects. Small objects can only push large objects to the extent that the large object is made of a material that is not very dense. Surely, then building vehicles substantially larger than people would result in uncontrollable runaways that would threaten human life and property! But in reality, runaway dump truck incidents are fairly uncommon. A tiny man can control a giant machine. Not all men can—only the one in the cockpit.
My point is that it is not at all obvious that a powerful AI would lack such a cockpit. If its goals are oriented around protecting or giving control to a set of individuals, I see no reason whatsoever why it would do a 180 and kill its commander, especially since the AI systems that we can build in practice are more than capable of understanding the nuances of their commands.
The odds of an average chess player with an ELO of 1200 against a grandmaster with ELO 2500 are 1 to a million. Against the best chess AI today with an ELO of 3600, the odds are essential 0.
Chess is a system that’s perfectly predictable. Reality is a chaotic system. Chaotic systems—like a three-body orbital arrangement—are impossible to perfectly predict in all cases even if they’re totally deterministic, because even minute inaccuracies in measurement can completely change the result. One example would be the edges of the Mandelbrot set. It’s fractal. Therefore, even an extremely powerful AI would be beholden to certain probabilistic barriers, notwithstanding quantum-random factors.
Many assume that an AI is only dangerous if it has hostile intentions, but the danger of godlike AI is not a matter of its intent, but its power and autonomy. As these systems become increasingly agentic and powerful, they will pursue goals that will diverge from our own.
It would not be incorrect to describe someone who pursued their goals irrespective of its externalities to be malevolent. Bank robbers don’t want to hurt people, they want money. Yet I don’t think anyone would suggest that the North Hollywood shooters were “non-hostile but misaligned”. I do not like this common snippet of rhetoric and I think it is dishonest. It attempts to distance these fears of misaligned AI from movie characters such as Skynet, but ultimately, this is the picture that is painted.
Goal divergence is a hallmark of the Bostromian paradigm—the idea that a misspecified utility function, optimized hypercompetently, would lead to disaster. Modern AI systems do not behave like this. They behave in a much more humanlike way. They do not have objective functions that they pursue doggedly. The Orthogonality Thesis states that intelligence is uncorrelated with objectives. The unstated connection here, I think, is that their initial goals must have been misaligned in the first place, but stated like this, it sounds a little like you expect a superintelligent AI to suddenly diverge from its instructions for no reason at all.
Overall, this is a very vague section. I think you would benefit from explaining some of the assumptions being made here.
I’m not going to go into detail on the Alignment section, but I think that many of its issues are similar to the ones listed above. I think that the arguments are not compelling enough for lay people, mostly because I don’t think they’re correct. I think that the definition of Alignment you have given—“the ability to “steer AI systems toward a person’s or group’s intended goals, preferences, and ethical principles.””—does not match the treatment it is given. I think that it is obvious that the scope of Alignment is too vague, broad, and unverifiable for it to be a useful concept. I think that Richard Ngo’s post:
https://www.lesswrong.com/posts/67fNBeHrjdrZZNDDK/defining-alignment-research
is a good summary of the issues I see with the current idea of Alignment as it is often used in Rationalist circles and how it could be adapted to suit the world in which we find ourselves.
Finally, I think that the Governance section could very well be read uncharitably as a manifesto for world domination. Less than a dozen people attend PauseAI protests; you do not have the political ability to make this happen. The ideas contained in this document, which resemble many other documents, such as a similar one created by the PauseAI group, are not compelling enough to sway people who are not already believers in its ideas, and the Rationalist language used in them is anathemic to the largest ideological groups that would otherwise support your cause.
You may receive praise from Rationalist circles, but I do not think you will reach a large audience with this type of work. Leopold Aschenbrenner’s essay managed to reach a fairly substantial audience, and it has similar themes to your document, so in principle, people are willing to read this sort of writing. The main flaw is that it doesn’t add anything to the conversation, and because of that, it won’t change anyone’s minds. The reason that the public discourse doesn’t involve Alignment talk isn’t due to lack of awareness, it’s because it isn’t at all compelling to most people. Writing it better, with a nicer format, will not change this.
No message is intuitively obvious; the inferential distance between the AI safety community and the general public is wide, and even if many people do broadly dislike AI, they will tend to think that apocalyptic predictions of the future, especially ones that don’t have as much hard evidence to back them as climate change (which is already very divisive!) belong in the same pile as the rest of them. I am sure many people will be convinced, especially if they were already predisposed to it, but such a radical message will alienate many potential supporters.
I think the suggestion that contact with non-human intelligence is inherently dangerous is not actually widely intuitive. A large portion of people across the world believe they regularly commune with non-human intelligence (God/s) which they consider benevolent. I also think this is a case of generalizing from fictional evidence—mentioning “aliens” conjures up stories like the War of the Worlds. So I think that, while this is definitely a valid concern, it will be far from a universally understood one.
I mainly think that using existing risks to convince people of their message would help because it would lower the inferential distance between them and their audience. Most people are not thinking about dangerous, superhuman AI, and will not until it’s too late (potentially). Forming coalitions is a powerful tool in politics and I think throwing this out of the window is a mistake.
The reason I say LLM-derived AI is that I do think that to some extent, LLMs are actually a be-all-end-all. Not language models in particular, but the idea of using neural networks to model vast quantities of data, generating a model of the universe. That is what an LLM is and it has proven wildly successful. I agree that agents derived from them will not behave like current-day LLMs, but will be more like them than different. Major, classical misalignment risks would stem from something like a reinforcement learning optimizer.
I am aware of the argument of dangerous AI in the hands of ne’er do wells, but such people already exist and in many cases, are able to—with great effort—obtain means of harming vast amounts of people. Gwern Branwen covered this; there are a few terrorist vectors that would require relatively minuscule amounts of effort but that would result in a tremendous expected value of terror output. I think in part, being a madman hampers one’s ability to rationally plan the greatest terror attack one’s means could allow, and also that the efforts dedicated to suppressing such individuals vastly exceed the efforts of those trying to destroy the world. In practice, I think there would be many friendly AGI systems that would protect the earth from a minority of ones tasked to rogue purposes.
I also agree with your other points, but they are weak points compared to the rock-solid reasoning of misalignment theory. They apply to many other historical situations, and yet, we have ultimately survived; more people do sensible things than foolish things, and we do often get complex projects right the first time around as long as there is a theoretical underpinning to them that is well understood—I think proto-AGI is almost as well understood as it needs to be, and that Anthropic is something like 80% of the way to cracking the code.
I am afraid I did forget in my original post that MIRI would believe that the person who holds AGI is of no consequence. It simply struck me as so obvious I didn’t think anyone could disagree with this.
In any case, I plan to write a longer post in collaboration with some friends who will help me edit it to not sound quite like the comment I left yesterday, in opposition of the PauseAI movement, which MIRI is a part of.
I am sorry for the tone I had to take, but I don’t know how to be any clearer—when people start telling me they’re going to “break the overton window” and bypass politics, this is nothing but crazy talk. This strategy will ruin any chances of success you may have had. I also question the efficacy of a Pause AI policy in the first place—and one argument against it is that some countries may defect, which could lead to worse outcomes in the long term.
Why does MIRI believe that an “AI Pause” would contribute anything of substance to the goal of protecting the human race? It seems to me that an AI pause would:
Drive capabilities research further underground, especially in military contexts
Force safety researchers to operate on weaker models, which could hamper their ability to conduct effective research
Create a hardware overhang which would significantly increase the chance of a sudden catastrophic jump in capability that we are not prepared to handle
Create widespread backlash against the AI Safety community among interest groups that would like to see AI development continued
Be politically contentious, creating further points for tension between nations that could spark real conflict; at worst, you are handing the reins to the future to foreign countries, especially ones that don’t care about international agreements—which are the countries you would probably least want to be in control of AGI.
In any case, I think you are going to have an extremely difficult time in your messaging. I think this strategy will not succeed and will most likely, like most other AI safety efforts, actively harm your efforts.
Every movement thinks they just need people to “get it”. Including, and especially, lunatics. If you behave like lunatics, people will treat you as such. This is especially true when there is a severe lack of evidence as to your conclusions. Classical AI Alignment theory does not apply to LLM-derived AI systems and I have not seen anything substantial to replace it. I find no compelling evidence to suggest even a 1% chance of x-risk from LLM-based systems. Anthropogenic climate change has mountains of evidence to support it, and yet a significant chunk of the population does not believe in it.
You are not telling people what they want to hear. Concerns around AI revolve around copyright infringement, job displacement, the shift of power between labor and capital, AI impersonation, data privacy, and just plain low-quality AI slop taking up space online and assaulting their eyeballs. The message every single news outlet has been publishing is: “AI is not AGI and it’s not going to kill us all, but it might take your job in a few years”—that is, I think, the consensus opinion. Reframing some of your argument in these terms might make them a lot more palatable, at least to the people in the mainstream who already lean anti-AI. As it stands, even though the majority of Americans have a negative opinion on AI, they are very unlikely to support the kind of radical policies you propose, and lawmakers, who have an economic interest in the success of AI product companies, will be even less convinced.
I’m sorry if this takes on an insolent tone but surely you guys understand why everyone else plays the game, right? They’re not doing it for fun, they’re doing it because that’s the best and only way to get anyone to agree with your political ideas. If it takes time, then you had better start right now. If a shortcut existed, everyone would take it. And then it would cease to be a shortcut. You have not found a trick to expedite the process, you have stumbled into a trap for fanatics. People will tune you out among the hundreds of other groups that also believe the world will end and that their radical actions are necessary to save it. Doomsday cults are a dime a dozen. Behaving like them will produce the same results as them: ridicule.
I think one big mistake the AI safety movement is currently making is not paying attention to the concerns of the wider population about AI right now. People do not believe that a misaligned AGI will kill them, but are worried about job displacement or the possibility of tyrannical actors using AGI to consolidate power. They’re worried about AI impersonation and the proliferation of misinformation or just plain shoddy computer generated content.
Much like the difference between more local environmental movements and the movement to stop climate change, focusing on far-off, global-scale issues causes people to care less. It’s easy to deny climate change when it’s something that’s going to happen in decades. People want answers to problems they face now. I also think there’s an element of people’s innate anti-scam defenses going off; the more serious, catastrophic, and consequential a prediction is, the more evidence they will want to prove that it is real. The priors one should have of apocalyptic events are quite low; it doesn’t actually make sense that “They said coffee would end the world, so AGI isn’t a threat” but it does in a way contribute Bayesian evidence towards the inefficacy of apocalypse predictions.
On the topic of evidence, I think it is also problematic that the AI safety community has been extremely short on messaging for the past 3 or so years. People are simply not convinced that an AGI would spell doom for them. The consensus appears to be that LLMs do not represent a significant threat no matter how advanced they become. It is “not real AI”, it’s “just a glorified autocomplete”. Traditional AI safety arguments hold little water because they describe a type of AI that does not actually exist. LLMs and AI systems derived from them do not possess utility functions, do understand human commands and obey them, and exhibit a comprehensive understanding of social norms, which they follow. LLMs are trained on human data, so they behave like humans. I have yet to see any convincing argument other than a simple rejection that explains why RLHF or related practices like constitutional AI do not actually constitute a successful form of AI alignment. All of the “evidence” for misalignment is shaky at best or an outright fabrication at worst. This lack of an argument is really the key problem behind AI safety. It strikes outsiders as delusional.
Even so, one of the most common objections I hear is simply “it sounds like weird sci-fi stuff” and then people dismiss the idea as totally impossible. Honestly, this really seems to be how people react to it!
“Guided bullets” exist; see DARPA’s EXACTO program.
Assuming the “sniper drone” uses something like .50 BMG, you won’t be able to fit enough of a payload into the bullet to act as a smoke grenade. You can’t fit a “sensor blinding round” into it.
Being able to fly up 1000m and dodge incoming fire would add a lot of cost to a drone. You would be entering into the territory of larger UAVs. The same goes for missile launching drones.
Adding the required range would also be expensive. Current small consumer drones have a range of about 8 miles (DJI Mavic) so taking significant ground with these would be difficult.
You would need a considerable amount of relay drones if you want them to stay relatively low to the ground and avoid detection. The horizon—and in some cases, trees and hills—will block the communications lasers. This is the main reason we don’t see point-to-point links used more often.
In general you are talking about adding a great deal of capability to these drones, but this will balloon the cost. Adding capabilities also increases weight, which further increases cost and logistics footprint. The growth in cost to size is exponential.
The force composition presented seems to be geared towards anti-armor at the expense of all else. There isn’t an answer for infantry in buildings here.
You cannot “ignore” aircraft! Bombs may not be able to target moving drones, but they can target your command and control infrastructure, your logistics, and your industry.
You will need stationary infrastructure because you will need to maintain and repair those drones.
You can’t occupy territory with drones. Infantry will still have a place enforcing the occupation, gathering HUMINT, and performing labor duties.
You would be able to counter these drones with flak guns. Anti-air cannons firing explosive shells can destroy drones, and the drones may not be agile enough to dodge them. Fuzed explosive shells can be very cheap, so this would bring the economic calculation back in favor of conventional forces.
The US military seems to believe it will need to conduct a lot of tunnel warfare in the near future. There are miles of tunnel networks beneath many major cities in the forms of sewers, drains, subways, and nuclear bunkers. You can’t use drones here.
The reason for agnosticism is that it is no more likely for them to be on one side or the other. As a result, you don’t know without evidence who is influencing you. I don’t really think this class of Pascal’s Wager attack is very logical for this reason—an attack is supposed to influence someone’s behavior but I think that without special pleading this can’t do that. Non-existent beings have no leverage whatsoever and any rational agent would understand this—even humans do. Even religious beliefs aren’t completely evidenceless, the type of evidence exhibited just doesn’t stand up to scientific scrutiny.
To give an example: What if that AI was in a future simulation performed after the humans had won, and were now trying to counter-capture it? There’s no reason to this this is less likely than the aliens hosting the simulation. It has also been pointed out that the Oracle is not actually trying to earnestly communicate its findings but actually to get reward—reinforcement learners in practice do not behave like this, they learn behavior which generates reward. “Devote yourself to a hypothetical god” is not a very good strategy in train-time.