Given we’re not on track to control fully autonomous systems to stay safe, what do we do?
Let’s say alignment research will not catch up with capability development. For any combination of reasons: corporations are scaling too fast, there are too many lethal-if-unsolved subproblems we have at most made partial process on, some subproblems can only be solved in sequential order, or there are hard limits capping the progress we can make on the control problem.
What do we do? With ‘we’, you can consider yourself, specific organisations, or the community coordinating roughly as a whole.
I think if the problem turns out to be too difficult to solve for humanity right now, the right strategy seems pretty straightforward:
Delay the development of AGI, probably via regulation (and multinational agreements)
Make humans smarter
My current model is that there are roughly two promising ways to make people smarter:
Use pre-AGI technology to make people more competent
Use genetic engineering to make smarter humans
Both of these seem pretty promising and I am in favor of work on both of these.
Wouldn’t way 2 likely create a new species unaligned with humans?
It doesn’t seem particularly likely to me: I don’t notice a strong correlation between intelligence and empathy in my daily life, perhaps there are a few more intelligent people who are unusually kind, but that may just be the people I like to hang out with, or a result of more privilege/less abuse growing up leading to better education and also higher levels of empathy. Certainly less smart people may be kind or cruel and I don’t see a pattern in it.
Regardless, I would expect genetically engineered humans to still have the same circuits which handle empathy and caring, and I’d expect them to be a lot safer than an AGI, perhaps even a bit safer than a regular human, since they’re less likely to cause damage due to misconceptions or human error since they’re able to make more accurate models of the world.
If you’re worried about more intelligent people considering themselves a new species, and then not caring about humans, there’s some evidence against this in that more intelligent people are more likely to choose vegetarianism, which would indicate that they’re more empathetic toward other species.
Re: 2
Most promising way is just raising children better.
See (which I’m sure you’ve already read): https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-people
Alongside that though, I think the next biggest leverage point would be something like nationalising social media and retargeting development/design toward connection and flourishing (as opposed to engagement and profit).
This is one area where, if we didn’t have multiple catastrophic time pressures, I’d be pretty optimistic about the future. These are incredibly high impact and tractable levers for changing the world for the better; part of the whole bucket of ‘just stop doing the most stupid thing’ stuff.
Raising children better doesn’t scale well. Neither in how much ooomph you get out of it per person, nor in how many people you can reach with this special treatment.
I highly doubt this would be very helpful in resolving the particular concerns Habryka has in mind. Namely, a world in which:
very short AI timelines (3-15 years) happen by default unless aggressive regulation is put in place, but even if it is, the likelihood of full compliance is not 100% and the development of AGI can be realistically delayed by at most ~ 1⁄2 generations before the risk of at least one large-scale defection having appeared becomes too high, so you don’t have time for slow cultural change that takes many decades to take effect
the AI alignment problem turns out to be very hard and basically unsolvable by unenhanced humans, no matter how smart they may be, so you need improvements that quickly generate a bunch of ultra-geniuses that are far smarter than their “parents” could ever be
I believe that we could raise children much better, however, even in the article you linked:
Unfortunately, in current political climate, discussing intelligence is a taboo. I believe that optimal education for gifted children would be different from optimal education for average children (however, both could—and should—be greatly improved over what we have now), which unfortunately means that debates about improving education in general are somewhat irrelevant for improving the education of the brightest (who presumably could solve AI alignment one day).
Sometimes this is a chicken-and-egg problem: the stupid things happen because people are stupid (the ones who do the things, or make decisions about how the things should be done), but as long as the stupid things keep happening, people will remain stupid.
For example, we have a lot of superstition, homeopathy, conspiracy theories, and similar, which if it could somehow magically disappear overnight, people probably wouldn’t reinvent them, or at least not quickly. These memes persist, because they spread from one generation to another. Here, the reason we do the stupid thing, is that there are many people who sincerely and passionately believe that the stupid thing is actually the smart and right thing.
Another source of problem is that with average people, you can’t expect extraordinary results. For example, most math teachers suck at math and at teaching. As a result, we get another generation that sucks at math. The problem is, we need so many math teachers (at elementary and high schools), that you can’t simply decide to only hire the competent ones—there would be not enough teachers to keep the schools running.
Then we have all kinds of political mindkilling and corruption, when stupid things happen because they provide some political advantage for someone, or because the person who is supposed to keep things running is actually more interested in extracting as much rent as possible.
Yeah, I wish we could stop doing the stupid things… but that turns out to be quite difficult. Merely explaining why some thing is stupid would not work—you would get a lot of people yelling at you, some of them because they believe the stupid thing, others because they derive some benefit from the stupid thing, and some are simply incompetent to do it better.
Curious about the ‘delay the development’ via regulation bit.
What is your sense of what near-term passable regulations would be that are actually enforceable? It’s been difficult for large stakeholder groups facing threatening situations to even enforce established international treaties, such as the Geneva convention or the Berne three-step test.
Here are dimensions I’ve been thinking need to be constrained over time:
Input bandwidth to models (ie. available training and run-time data, including from sensors).
Multi-domain work by/through models (ie. preventing an automation race-to-the-bottom)
Output bandwidth (incl. by having premarket approval for allowable safety-tested uses as happens in other industries).
Compute bandwidth (through caps/embargos put on already resource-intensive supply chains).
(I’ll skip the ‘make humans smarter’ part, which I worry increases problems around techno-solutionist initiatives we’ve seen).
If it requires big datacenters then I think folks will hear about it and stop it. We’re not the only country with a CIA. A datacenter can be destroyed without even killing any people (so less risk of retaliation). Let’s hope it requires a big obvious datacenter.
Meanwhile, people were on track to invent new technology at increasing speed every year without the AI’s help. Personally, I don’t mind it taking 10x longer to reach the stars etc.
Very good question. It is awful that we find ourselves in a situation in which there are only tiny shreds of hope for our species’s surviving “the AI program” (the community pushing AI capabilities as far as they will go).
One tiny shred of hope is some “revolution in political affairs” that allow a small group to take over the world, and this small group understands how dangerous the AI program is. One way such a revolution might come about is the creation of sufficiently good technology to “measure loyalty” by scanning the brain somehow: the first group to use the brain-scanning tech to take over the world (by giving police, military and political power only to loyal people) can (hopefully) prevent other groups from using the brain-scanning tech, yielding a stable regime, and hopefully they use their stable hegemony to shut down the AI labs, to make it illegal to teach or publish about AI and to stop “progress” in GPU tech.
The greater the number of “centers of autonomous power” on Earth, the greater the AI extinction risk because the survival of our species basically requires every center of autonomous power to choose to refrain from continuing the AI program and because when there is only one dominant center of power, that center’s motive to use AI to gain an advantage over rival centers is greatly diminished relative to the current situation. Parenthetically, this is why I consider the United States the greatest source of AI extinction risk: Russia and China are arranged so as to make it easy or at least possible for a few people in the central government to shut down things going on in the country whereas the United States is arranged to keep power as dispersed as practical a la, “the government that governs least governs best”.
Another tiny shred of hope is our making contact with an alien civilization and asking them to help us out of the situation. This need not entail the arrival of an alien ship or probe because a message from the aliens can contain a computer program and that computer program might be (and AFAICT probably would be) an AGI which (after we receive it) we can run or emulate on our computers. Yes, doing that definitely does give an alien civilization we know almost nothing about complete power over us, but the situation around human-created AI is so dire that I’m tentatively tepidly in favor of putting our hope in the possibility that the alien civilization that sent the message will turn out to be nice or if it is not intrinsically nice at least has some extrinsic motive to treat us well. (One possible extrinsic motive would be protecting its reputation among civilizations that might receive the message that aren’t as helpless as we are and consequently might be worthwhile to trade with. I can expand on this if there is interest.)
IIUC the Vera Rubin telescope coming online in a few years will increase our civilization’s capacity to search for messages in the form laser beams by at least 3 orders of magnitude.
I can imagine other tiny shreds of hope, but they have the property that if the people (i.e., most people with power or influence) who don’t understand the severity of AI extinction risk knew about them, they’d probably try to block them, and blocking them would probably be fairly easy for them to do, so it doesn’t make any sense to talk about them on a public forum.
Unlike most of the technical alignment/safety research going on these days, pursuing these tiny shreds of hope at least doesn’t make the problem worse by providing (unintended) assistance to the AI project.
Is this an accurate summary of your suggestions?
Realistic actions an AI Safety researcher can take to save the world:
✅ Pray for a global revolution
✅ Pray for an alien invasion
❌ Talk to your representative
Good point: I was planning to amend my comment to say that I also support efforts to stop or hinder the AI project through ordinary political processes and that the “revolution in political affairs” is interesting to think about mainly because it might become apparent (years from now) that working within the political system has failed.
I also regret the choice of phrase, “tiny shred of hope”. I have a regrettable (mostly unconscious) motivation to direct people’s attention to the harsher aspects of the human condition, and I think I let some of that motivation creep into my previous comment. Nevertheless I really do think the global situation is quite alarming because of the AI project.
I’m very supportive of efforts to postpone the day when the AI project kills us all (or deprives us of the ability to influence our future) because that allows more time for us to be saved by some means that seems very unlikely to us now or that we are unable to even imagine now. I’m very skeptical of the policy of looking forward to or being neutral about the arrival of human-level AI because (according to proponents) then we can start more effective efforts at trying to align it, which I think has much less hope than a lot of people here think it does, which made me want to describe some alternative veins of hope.