an effective ai safety initiative

a typical story

A frequent criticism of the Effective Altruism/​AI Safety memeplex is that they are too focused on theoretical harms from AI and not enough on real ones (it’s me, I’m the one who frequently makes this criticism).

Consider the recent California AI bill, which has heavy fingerprints of EA on it.

Notice that the bill:

  1. Literally does not apply to any existing AI

  2. Addresses only theoretical harms (e.g. AI could be used for WMD)

  3. Does so by attacking open source models (thereby in my estimation actually making us less safe)

What if instead there was a way we could spend a small amount of money in a way that:

  1. Addressed real, existing harms caused by AI (or AI like) actors

  2. Without passing any new laws or taking away anyone’s rights

  3. That virtually everyone would agree is net-beneficial

Sounds like a clear (effective) win, right?

Here it is!

What will the first malevolent AGI look like?

Virtually every realistic “the AI takes over the world” story goes like this:

  1. The AI gets access to the internet

  2. It hacks a bunch of stuff and starts accumulating resources (money, host computers, AWS GPU credits, etc)

  3. It uses those resources to (idk, turn us all into paperclips)

This means that learning how to defend and protect the internet from malicious actors is a fundamental AI safety need.

In a world of slow takeoff where we survive because of multiple warning shots, the way we survive is by noticing the warning shots and increasing the resilience of our society in response to those warning shots. There is no fire alarm, so why not start increasing resilience now.

Fortunately (unfortunately?), we live in a world where the first malevolent AGI we encounter will probably look like ChaosGPT, a self-propagating AI Agent programmed to survive while causing as much chaos as possible.

Such an AGI will not be a super-intelligent agent that uses acausal bargaining to talk its way out of a box. It will be barely more intelligent than GPT-4. For all I know, it will literally be llama-3-400B-evil-finetune. Assuming AI scaling trends continue, near-future models will have just enough smarts to pull off common internet scams, steal money, and use that money to rent more GPUs to feed themselves.

How do we counter such threats?

the OODA loop

An idea that the US military has been keen to since at least the 1970′s is that speed is critical to victory. This concept desperately needs to be applied to cyber security.

Consider a typical hack:

the first malevolent AGI will look like this
  1. The hacker compromises some target of value’s account

  2. The hacker uses this account to direct victims to a webpage that does some combination of

    1. steals their money

    2. installs further malware on their computer

  3. Eventually someone notices the account is hacked

  4. Authorities step in and take down the hacked message

Notice that is is precisely during the time between step 2) and step 4) that the attacker is gaining money/​power/​etc. Therefore the longer that it takes to get to step 4, the faster the attacker grows.

Much like the R0 of Covid-19, this ratio of: determines how fast the malevolent actor grows, and if it is low enough, eventually the agent dies out.

But everyone knows hacking is bad, how is this an EA cause?

Our government is very bad at information technology. EAs, on the other hand, tend to be extremely technically savvy.

Another lesson learned from Covid-19 was that something as simple as setting up a phone number people can call for information is utterly beyond the ability of the US government. On the other hand EA-adjacent people have this skill in droves.

So here is my proposal:

Set up a rapid-response cyber-security “911”. This will be a phone-number/​website/​twitter account/​everything-else endpoint where people can report if they suspect an account has been hacked. Then implement a system to rapidly validate that information and pass it on to the relevant authorities as quickly as possible.

The goal (and everything else about the design will work towards this) will be to minimize the amount of time from step 1) in our model attack to step 4)

How much would this cost?

Not much.

If you read this far, there is a strong probability you could build an MVP website in an afternoon. Vaccinate CA appeared to cost “millions of dollars”, but this is pennies in the EA world, especially with the entire future of humanity at stake.

Would it really be net beneficial?

If the only outcome was that 1000 additional grandmas across the country called “hacker 911” and were told “no, that’s not your grandson in jail, it’s an AI”, instead of wiring $10k to a bad actor, it would be net beneficial. We’re talking billions of dollars so even a few percent effectiveness would be a net-win for society.

Seriously, the wins here are massive.

None of this matters if the Super-intelligent AGI hacks directly into people’s brains using Diamondoid Bacteria

Maybe not.

But anything that increases the ability of our society to survive by even a little bit helps.

Are there other things like this that we should be doing?

Probably, yes.

Feel free to suggest more in the comments.

Why are you writing a blog post instead of just doing this?

happens all the time