Logan Zoellner comments on What does a positive outcome without alignment look like?

Logan Zoellner 9 May 2020 14:46 UTC
1 point
You clearly have some sort of grudge against or dislike of china. In the face of a pandemic, they want basically what we want, to stop it spreading and someone else to blame it on. Chinese people are not inherently evil.
I certainly don’t think the Chinese are inherently evil. Rather I think that from the view of an American in the 1990′s a world dominated by a totalitarian China which engages in routine genocide and bans freedom of expression would be a “negative outcome to the rise of China”.
This is a description of a Nash equilibria in human society. Their stability depends on humans having human values and capabilities.
Yes. Exactly. We should be trying to find a Nash equilibrium in which humans are still alive (and ideally relatively free to pursue their values) after the singularity. I suspect such a Nash equilibrium involves multiple AIs competing with strong norms against violence and focus on positive-sum trades.
But I don’t see why any of the Nash equilibria between superintelligences will be friendly to humans.
This is precisely what we need to engineer! Unless your claim is that there is no Nash equilibrium in which humanity survives, which seems like a fairly hopeless standpoint to assume. If you are correct, we all die. If you are wrong, we abandon our only hope of survival.
Why would one AI start shooting because the other AI did an action that benefited both equally?
Consider deep seabed mining. I would estimate the percent of humans who seriously care (are are aware of the existence of) the sponges living at the bottom of the deep ocean at <1%. Moreover, there are substantial positive economic gains that could potentially be split among multiple nations from mining deep sea nodules. Nonetheless, every attempt to legalize deep sea mining has run unto a hopeless tangle of legal restrictions because most countries view blocking their rivals as more useful than actually mining the deep sea.
If you have several AI’s and one of them cares about humans, it might bargain for human survival with the others. But that implies some human managed to do some amount of alignment.
I would hope that some AIs have an interest in preserving humans for the same reason some humans care about protecting life on the deep seabed, but I don’t think this is a necessary condition for ensuring humanity’s survival in a post-singularity world. We should be trying to establish a Nash equilibrium in which even insignificant actors have their values and existence preserved.
My point is, I’m not sure that aligned AI (in the narrow technical sense of coherently extrapolated values) is even a well-defined term. Nor do I think it is an outcome to the singularity we can easily engineer, since it requires us to both engineer such an AI and to make sure that it is the dominant AI in the post-singularity world.
- Donald Hobson 9 May 2020 18:24 UTC
  1 point
  Parent
  This is precisely what we need to engineer! Unless your claim is that there is no Nash equilibrium in which humanity survives, which seems like a fairly hopeless standpoint to assume. If you are correct, we all die. If you are wrong, we abandon our only hope of survival.
  What I am saying is that if you roll a bunch of random superintelligences, superintelligences that don’t care in the slightest about humanity in their utility function, then selection of a Nash equilibria is enough to get a nice future. It certainly isn’t enough if humans are doing the selection and we don’t know what the AI’s want or what technologies they will have. Will one superintelligence be sufficiently transparent to another superintelligence that they will be able to provide logical proofs of their future behaviour to each other? Where does the armsrace of stealth and detection end up? What about
  If at least some of the AI’s have been deliberately designed to care about us, then we might get a nice future.
  From the article you link to
  After the initial euphoria of the 1970s, a collapse in world metal prices, combined with relatively easy access to minerals in the developing world, dampened interest in seabed mining.
  On the other hand, people do drill for oil in the ocean. It sounds to me like deep seabed mining is unprofitable or not that profitable, given current tech and metal prices.
  I suspect such a Nash equilibrium involves multiple AIs competing with strong norms against violence and focus on positive-sum trades.
  If you have a tribe of humans, and the tribe has norm then everyone is expected to be able to understand the norms. The norms have to be fairly straightforward to humans. Don’t do X except for [100 subtle special cases] gets simplified to don’t do X. This happens even when everyone would be better off with the special cases. When you have big corporations with legal teams, the agreements can be more complicated. When you have super-intelligences, the agreements can be Far more complicated. Humans and human organisations are reluctant to agree to a complicated deal that only benefits them slightly, from the overhead cost of reading and thinking about the deal.
  Whatsmore, the Nash equilibria that humanity has been in has changed with technology and society. If a Nash equilibria is all that protects humanity, if an AI comes up with a way to kill off all humans and distribute their reasources equally, without any AI being able to figure out who killed the humans, then the AI will kill all humans. Nash equilibria are fragile to details of situation and technology. If one AI can build a spacecraft and escape to a distant galaxy, which will be over the cosmic event horizon before the other AI’s can do anything, that changes the equilibrium. In a dyson swarm, one AI deliberately letting debris fly about might be able to Kessler syndrome the whole swarm, mutually assured destruction, but the debris deflection tech might improve and change the Nash equilibria.
  My point is, I’m not sure that aligned AI (in the narrow technical sense of coherently extrapolated values) is even a well-defined term. Nor do I think it is an outcome to the singularity we can easily engineer, since it requires us to both engineer such an AI and to make sure that it is the dominant AI in the post-singularity world.
  We need an AI that in some sense wants the world to be a nice place to live. If we were able to give a fully formal exact definition of this, we would be much further on at AI alignment. Saying that you want an image that is “beautiful and contains trees” is not a formal specification of the RGB values of each pixel. However, there are images that are beautiful and contain trees. Likewise saying you want an “aligned AI” is not a formal description of every byte of source code, but there are still patterns of source code that are aligned AI’s.
  Suppose someone figured out alignment and shared the result widely. Making your AI aligned is straightforward. Almost all the serious AI experts agree that AI risks are real and alignment is a good idea. All the serious AI research teams are racing to build an Aligned AI.
  Scenario 2. Aligned AI is a bit harder than unaligned AI. However, all the worlds competent AI experts realise that aligned AI would benefit all, and that it is harder to align an AI when you are in a race. They come together into a single worldwide project to build aligned AI. They take their time to do things right. Any competing group is tiny and hopeless, partly because they make an effort to reach out to and work with anyone competent in the field.