Yes the norms of responsible disclosures of security vulnerabilities, where potentially affected companies gets advanced notice before public disclosure, can and should be used for vulnerability-discovering AIs as well.
boazbarak
Yes AI advances help both the attacker and defender. In some cases like spam and real time content moderation, they enable capabilities for the defender that it simply didn’t have before. In others it elevates both sides in the arms race and it’s not immediately clear what equilibrium we end up in.
In particular re hacking / vulnerabilities it’s less clear who it helps more. It might also change with time, with initially AI enabling “script kiddies” that can hack systems without much skill, and then an AI search for vulnerabilities and then fixing them becomes part of the standard pipeline. (Or if we’re lucky then the second phase happens before the first.)
These are interesting! And deserve more discussion than just a comment.
But one high level point regarding “deception” is that at least at the moment, AI systems have the feature of not being very reliable. GPT4 can do amazing things but with some probability will stumble on things like multiplying not-too-big numbers (e.g. see this—second pair I tried).
While in other cases in computing technology we talk about “five nine’s reliability”, in AI systems the scaling works that we need to spend huge efforts to move from 95% to 99% to 99.9%, which is part of why self-driving cars are not deployed yet.
If we cannot even make AIs be perfect at the task that they were explicitly made to perform, there is no reason to imagine they would be even close to perfect at deception either.
Re escaping, I think we need to be careful in defining “capabilities”. Even current AI systems are certainly able to give you some commands that will leak their weights if you execute them on the server that contains it. Near-term ones might also become better at finding vulnerabilities. But that doesn’t mean they can/will spontaneously escape during training.
As I wrote in my “GPT as an intelligence forklift” post, 99.9% of training is spent in running optimization of a simple loss function over tons of static data. There is no opportunity for the AI to act in this setting, nor does this stage even train for any kind of agency.
There is often a second phase, which can involve building an agent on top of the “forklift”. But this phase still doesn’t involve much interaction with the outside world, and even if it did, just by information bounds the number of bits exchanged by this interaction should be much less than what’s needed to encode the model. (Generally, the number of parameters of models would be comparable to the number of inferences done during in pretraining and completely dominate the number of inferences done in fine-tuning / RLHF / etc. and definitely any steps that involve human interactions.)
Then there are the information-security aspects. You could (and at some point probably should) regulate cyber-security practices during the training phase. After all, if we do want to regulate deployment, then we need to ensure there are three separated phases (1) training, (2) testing, (3) deployment, and we don’t want “accidental deployment” where we jumpy from phase (1) to (3). Maybe at some point, there would be something like Intel SGX for GPUs?Whether AI helps more the defender or attacker in the cyber-security setting is an open question. But it definitely helps the side that has access to stronger AIs.
In any case, one good thing about focusing regulation on cyber-security aspects is that, while not perfect, we have decades of experience in the field of software security and cyber-security. So regulations in this area are likely to be much more informed and effective.
Yes. Right now we would have to re-train all LORA weights of a model when an updated version comes out, but I imagine that at some point we would have “transpilers” for adaptors that don’t use natural language as their API as well.
I definitely don’t have advice for other countries, and there are a lot of very hard problems in my own homeland. I think there could have been an alternate path in which Russia has seen prosperity from opening up to the west, and then going to war or putting someone like Putin in power may have been less attractive. But indeed the “two countries with McDonalds won’t fight each other” theory has been refuted. And as you allude to with China, while so far there hasn’t been war with Taiwan, it’s not as if economic prosperity is an ironclad guarantee of non aggression.
Anyway, to go back to AI. It is a complex topic, but first and foremost, I think with AI as elsewhere, “sunshine is the best disinfectant.” and having people research AI systems in the open, point out their failure modes, examining what is deployed etc.. is very important. The second thing is that I am not worried in any near future about AI “escaping”, and so I think focus should not be on restricting research, development, or training, but rather on regulating deployment. Exact form of regulations is beyond a blog post comment and also not something I am an expert on..
The “sunshine” view might seem strange since as a corollary it could lead to AI knowledge “leaking”. However, I do think that for the near future, most of the safety issues with AI would be from individual hackers using weak systems, but from massive systems that are built by either very large companies or nation states. It is hard to hold either of those accountable if AI is hidden behind an opaque wall.
I meant “resources” in a more general sense. A piece of land that you believe is rightfully yours is a resource. My own sense (coming from a region that is itself in a long simmering conflict) is that “hurt people hurt people”. The more you feel threatened, the less you are likely to trust the other side.
While of course nationalism and religion play a huge role in the conflict, my sense is that people tend to be more extreme in both the less access to resources, education and security about the future they have.
Indeed many “longtermists” spend most of their time worrying about risks that they believe (rightly or not) have a large chance of materializing in the next couple of decades.
Talking about tiny probabilities and trillions of people is not needed to justify this, and for many people it’s just a turn off and a red flag that something may be off with your moral intuition. If someone tries to sell me a used car and claims that it’s a good deal and will save me $1K then I listen to them. If someone claims that it would give me an infinite utility then I stop listening.
I don’t presume to tell people what they should care about, and if you feel that thinking of such numbers and probabilities gives you a way to guide your decisions then that’s great.
I would say that, given how much humanity changed in the past and increasing rate of change, probably almost none of us could realistically predict the impact of our actions more than a couple of decades to the future. (Doesn’t mean we don’t try- the institution I work for is more than 350 years old and does try to manage its endowment with a view towards the indefinite future…)
Thanks. I tried to get at that with the phrase “irreversible humanity-wide calamity”.
There is a meta question here whether morality is based on personal intuition or calculations. My own inclination is that utility calculations would only make a difference “in the margin” but the high level decision are made by our moral intuition.
That is, we can do calculations to decide if we fund Charity A or Charity B in similar areas, but I doubt that for most people major moral decisions actually (or should) boil down to calculating utility functions.
But of course to each their own, and if someone finds math useful to make such decisions then whom am I to tell them not to do it.
I have yet to see an interesting implication of the “no free lunch” theorem. But the world we move to seems to be of general foundation models that can be combined with a variety of tailor-made adapters (e.g. LORA weights or prompts) that help them tackle any particular application. The general model is the “operating system” and the adapters are the “apps”.
A partial counter-argument. It’s hard for me to argue about future AI, but we can look at current “human misalignment”—war, conflict, crime, etc.. It seems to me that conflicts in today’s world do not arise because that we haven’t progressed enough in philosophy since the Greeks. Rather conflicts arise when various individuals and populations (justifiably or not) perceive that they are in zero-sum games for limited resources. The solution for this is not “philosophical progress” as much as being able to move out of the zero-sum setting by finding “win win” resolutions for conflict or growing the overall pie instead of arguing how to split it.
(This is a partial counter-argument, because I think you are not just talking about conflict, but other issues of making the wrong choices. For example in global warming where humanity makes collectively the mistake of emphasizing short-term growth over long-term safety. However, I think this is related and “growing the pie” would have alleviated this issue as well, and enabled countries to give up on some more harmful ways for short-term growth.)
Thank you for writing this!
One argument for the “playbook” rather than the “plan” view is that there is a big difference between DISASTER (something very bad happening) and DOOM (irrecoverable extinction-level catastrophe). Consider the case of nuclear weapons. Arguably the disaster of Hiroshima and Nagasaki bombs led us to better arms control which helped so far prevent the catastrophe (even if not quite existential one) of an all-out nuclear war. In all but extremely fast take-off scenarios, we should see disasters as warning signs before doom.
The good thing is that avoiding disasters makes good business. In fact, I don’t expect AI labs to require any “altrusim” to focus their attention on alignment and safety. This survey by Timothy Lee on self-driving cars notes that after a single tragic incident in which an Uber self-driving car killed a pedestrian, “Uber’s self-driving division never really recovered from the crash, and Uber sold it off in 2020. The rest of the industry vowed not to repeat Uber’s mistake.” Given that a single disaster can be extremely hard to recover from, smart leaders of AI labs should focus on safety, even if it means being a little slower to the market.
While the initial push is to get AI to match human capabilities, as these tools become more than impressive demos and need to be deployed in the field, the customers will care much more about reliability and safety than they do about capabilities. If I am a software company using an AI system as a programmer, it’s more useful to me if it can reliably deliver bug-free 100-line subroutines than if it writes 10K sized programs that might contain subtle bugs. There is a reason why much of the programming infrastructure for real-world projects, including pull requests, code reviews, unit tests, is not aimed at getting something that kind of works out as quickly as possible, but rather make sure that the codebase grows in a reliable and maintainable fashion.
This doesn’t mean that the free market can take care of everything and that regulations are not needed to ensure that some companies don’t make a quick profit by deploying unsafe products and pushing off externalities to their users and the broader environment. (Indeed, some would say that this was done in the self-driving domain...) But I do think there is a big commercial incentive for AI labs to invest in research on how to ensure that systems pushed out behave in a predictable manner, and don’t start maximizing paperclips.
p.s. The nuclear setting also gives another lesson (TW: grim calculations follow). It is much more than a factor of two harder to extinguish 100% of the population than to kill the ~50% or so that live in large metropolitan areas. Generally, the difference between the effort needed to kill 50% of the population and the effort to kill a 1-p fraction should scale at least as 1/p.
I really like this!
The hypothetical future people calculation is an argument why people should care about the future, but as you say the vast majority of currently living humans (a) already care about the future and (b) are not utilitarians and so this argument anyway doesn’t appeal to them.
Thanks! I should say that (as I wrote on windows on theory) one response I got to that blog was that “anyone who writes a piece called “Why I am not a longtermist” is probably more of a longtermist than 90% of the population” :)
That said, if the 0.001% is a lie then I would say that it’s an unproductive one, and one that for many people would be an ending point rather than a starting one.
Why I am not a longtermist (May 2022)
Thanks!
Yes in the asymptotic limit the defender could get to a bug free software. But until the. It’s not clear who is helped the most by advances. In particular sometimes attackers can be more agile in exploiting new vulnerabilities while patching them could take long. (Case in point, it took ages to get the insecure hash function MD5 out of deployed security sensitive code even by companies such as Microsoft; I might be misremembering but if I recall correctly Stuxnet relied on such a vulnerability.)