Logan Zoellner comments on AI Boxing for Hardware-bound agents (aka the China alignment problem)

Logan Zoellner 10 May 2020 0:11 UTC
1 point
You are still being stupid, because you are ignoring effective tools and making the problem needlessly harder for yourself.
I think this is precisely where we disagree. I believe that we do not have effective tools for writing utility functions and we do have effective tools for designing at least one Nash Equilibrium that preserves human value, namely:
1) All entities have the right to hold and express their own values freely
2) All entities have the right to engage in positive-sum trades with other entities
3) Violence is anathema.
Some more about why I think humans are bad at writing utility functions:
I am the extremely skeptical about anything of the form: We will define a utility function that encodes human values. Machine learning is really good at misinterpreting utility functions written by humans. I think this problem will only get worse with a super-intelligence AI.
I am more optimistic about goals of the form “Learn to ask what humans want”. But I still think these will fail eventually. There are lots of questions even ardent utilitarians would have difficulty answering. For example, “Torture 1 person or give 3^^^3 people a slight headache?”.
I’m not saying all efforts to design friendly AIs are pointless, or that we should willingly release paperclip maximizes on the world. Rather, I believe we boost our chances of preserving human existence and values by encouraging a multi-polar world with lots of competing (but non-violent) AIs. The competing plan of “don’t create AI until we have designed the perfect utility function and hope that our AI is the dominant one” seems like it has a much higher risk of failure, especially in a world where other people will also be developing AI.
Importantly, we have the technology to deploy “build a world where people are mostly free and non-violent” today, and I don’t think we have the technology to “design a utility function that is robust against misinterpretation by a recursively improving AI”.

One additional aside
Suppose the AI has developed the tech to upload a human mind into a virtual paradise, and is deciding whether to do it or not.
I must confess the goals of this post are more modest than this. The Nash equilibrium I described is one that preserves human existence and values as they are it does nothing in the domain of creating a virtual paradise where humans will enjoy infinite pleasure (and in fact actively avoids forcing this on people).
I suspect some people will try to build AIs that grant them infinite pleasure, and I do not grudge them this (so long as they do so in a way that respects the rights of others to choose freely). Humans will fall into many camps. Those who just want to be left alone, those who wish to pursue knowledge, those who wish to enjoy paradise. I want to build a world where all of those groups can co-exist without wiping out one-another or being wiped out by a malevolent AI.
- Donald Hobson 10 May 2020 12:36 UTC
  1 point
  Parent
  1) All entities have the right to hold and express their own values freely
  2) All entities have the right to engage in positive-sum trades with other entities
  3) Violence is anathema.
  The problem is that these sound simple, they are easily expressed in english, but they are pointers to your moral decisions. For example, which lifeforms count as “entities”? If the AI’s decide that every bacteria is an entity that can hold and express its values freely then the result will probably look very weird, and might involve humans being ripped apart to rescue the bacteria inside them. Unborn babies? Brain damaged people? The word entities is a reference to your own concept of a morally valuable being. You have within your own head, a magic black box that can take in descriptions of various things, and decide whether or not they are “entities with the right to hold and express values freely”.
  You have a lot of information within your own head about what counts as an entity, what counts as violence ect, that you want to transfer to the AI.
  All entities have the right to engage in positive-sum trades with other entities
  This is especially problematic. The whole reason that any of this is difficult is because humans are not perfect game theoretic agents. Game theoretic agents have a fully specified utility function, and maximise it perfectly. There is no clear line between offering a human something they want, and persuading a human to want something with manipulative marketing. In some limited situations, humans can kind of be approximated as game theoretic agents. However, this approximation breaks down in a lot of circumstances.
  I think that there might be a lot of possible Nash equilibria. Any set rules that say to enforce all the rules including this one could be a Nash equilibria. I see a vast space of ways to treat humans. Most of that space contains ways humans wouldn’t like. There could be just one Nash equilibria, or the whole space could be full of Nash equilibria. So either their isn’t a nice Nash equilibria, or we have to pick the nice equilibria from amongst gazillions of nasty ones. In much the same way, if you start picking random letters, either you won’t get a sentence, or if you pick enough you will get a sentence buried in piles of gibberish.
  Importantly, we have the technology to deploy “build a world where people are mostly free and non-violent” today, and I don’t think we have the technology to “design a utility function that is robust against misinterpretation by a recursively improving AI”.
  The mostly free and nonvionlent kindof state of affairs is a Nash equilibria in the current world. It is only a Nash equilibria based on a lot of contingent facts about human psycology, culture and socioeconomic situation. Many other human cultures, most historical, have embraced slavery, pillaging and all sorts of other stuff. Humans have a sense of empathy, and all else being equal, would prefer to be nice to other humans. Humans have an inbuilt anger mechanism that automatically retaliates against others, whether or not it benefits themselves. Humans have strongly bounded personal utillities. The current economic situation makes the gains from cooperating relatively large.
  So in short, Nash equilibria amongst super-intelligences are very different from Nash equilibria amongst humans. Picking which equilibria a bunch of superintelligences end up in is hard. Humans being nice around the developing AI will not cause the AI’s to magically fall into a nice equilibria, any more than humans being full of blood around the AI’s will cause the AI’s to fall into a Nash equilibria that involves pouring blood on their circuit boards.
  There probably is a Nash equilibria that has AI’s pouring blood on their circuit boards, and all the AI’s promise to attack any AI that doesn’t, but you aren’t going to get that equilibrium just by walking around full of blood. You aren’t going to get it even if you happen to cut yourself on a circuit board or deliberately pour blood all over them.