AI demands unprecedented reliability
My AI pessimism is not addressed by optimistic arguments that do not promise the level of reliability required for a technology that our societies vital functions are expected to rely on.
Historically, useful improvements are hard to reverse and solutions can suddenly fail after we had become dependent on them. The potato seemed perfect for Ireland until they all blighted. Dutch dams seemed good, until they broke. Banks seem to keep suddenly failing on the regular.
AI is useful and we rely on it. Our culture is under heavy influence of search engines and algorithmic recommendations, people entrust their lives to self-driving cars and companies entrust sensitive or copy-righted data to generative models.
AI relies on AI. Some hypothetical ChatGPT call queried Bing which returned a page written by some human who, inspired by a Facebook recommendation, made Bart summarize the report of Covid prediction model that was co-written with Copilot. This low-stakes example fails all the time, but the state of affairs is still useful enough for people to build on top of and thereby increasingly rely upon.
The tool-AI future looks like one where man and machine rely on various interdependent models with medical diagnoses, mental help, bureaucracy, military protection, news, matchmaking, electricity supply, medicine production, weather forecasting, forecasting and AI development. The agentic-AI future has us rely on AI to build all of these itself.
We should demand a level of reliability that is stricter than any level of reliability our past technologies have achieved because our reliance on AI will be unprecedentedly large.
Addendum: unpersuasive arguments
I can only guess at what requirements for AI are good enough. Having humanity at AI’s mercy and surviving is good enough, but not a situation we’d want to get into without knowing in advance that that is what will happen.
In the meantime, here is a list of argument types that should be insufficient for lowering anyone’s P(Doom):
Counterarguments to failure modes
Unless your technique counteracts all possible (not only all named) failure modes, there is likely a failure mode we both missed.
Optimistic arguments that work in most situations
We are going to see all the situations. If your technique mostly works, it will surely fail.
Arguments against the reliability of humans
Humans are reliable because we live at humanity’s mercy and are surviving. I don’t know why this is the case, and if I were a dumb alien, I should not conjure a human into existence.
Evidence of prosaic reliability
All our current observations are but a tiny subset of all the possible situations AI is going to find itself in. A perfect track record is necessary yet insufficient.
Arguments for wiping out humanity
No.
Discussions with these types of arguments can still be worthwhile, but can not be sufficient for the purpose of making a dent into anyone’s P(Doom).
Number of nines of reliability is a discussed topic in distributed systems and servers. Could be a good comparison, it’s typically in the range of five nines
Getting anything involving an AI to even 99% accuracy level (2 nines), let alone to 4- or 5-nines, is extremely hard. Also, AI tends to be very sensitive to distribution shifts of types that happen frequently, so if we somehow got something involving AI to say 3-nines, I strongly suspect that it wouldn’t stay that way for long without a lot of ongoing work.
To be fair, search engines are another extremely unreliable technology which we have nevertheless managed to make useful.
Also, humans driving cars and achieving crash rates O(1 in 100,000 miles) (with multiple curves/other hazards per mile), and self-driving cars managing to get into the same region of reliability, both show that it is possible to get neural net systems up to at least around 6-nines reliability with enough work. But it’s certainly not easy, and these are situations where the price of failure is a few injuries or lives, not billions.
the question is what we want many nines on. I think many constraints we might want to guarantee we can get way better than that. the question is what we want to get constraints on
One thing that informs my thinking on this is the idea of a https://en.wikipedia.org/wiki/Core_catcher . Nuclear reactors have many layers of defense. There are many separate cooling systems, from redundant pumps, ports to connect auxiliary pumps, water sprays that can cool the core, backup on the backup pumps that use steam. The fuel pellets are clad in zirconium which is not cheap partly for it’s high melting point. Passively there is just a lot of water in the core that has to boil to expose it.
Yet ultimately all the defenses will fail and you have this nasty mess of radioactive lava which is fuel mixed with pieces of everything else. A human will receive a lethal dose almost immediately, and current robots will also usually fail as well. (there’s 3 of these messes at Fukushima also)
So what to do? Well, humans have designed the floor below the reactor to separate the lava into subcritical masses passively. It’s done by simply improving the concrete and shaping it into channels.
What are the takeaways?
For AI systems, if “escape and betray humanity” is a plausible threat, you need to assume it occasionally will happen. You need some kind of plan or mitigation to deal with this actually being a reality. It isn’t plausible for any containment not to fail eventually.
Whatever your fallback plan is needs to be simple. It needs to work when it counts.
As for what you might come up with, it depends. If you assume an ASI needs specific types of hardware to exist, the heat dissipation as it runs are visible with satellites, and it cannot escape to space or the oceans without a lengthy preparation process, then keeping records and inspections for all the large compute clusters would be an effective backup plan, as well as a plan to reach the infected hardware and disconnect it.
Obviously if you assume the ASI can just do ‘anything’ then you can’t make any plans.
Yep, my main thoughts on why its important to work on understanding current models are basically: even if these things do not have any risk from becoming unaligned or anything like that, do we really want to base much of our economy on things that we don’t really understand very well?