Donald Hobson comments on The case against AI alignment

Donald Hobson 28 Dec 2022 1:14 UTC
5 points
3
Your criticisms of my extreme pacifism example aren’t what I was thinking at all. I was more thinking.
Scene: 3 days pre singularity. Place: OpenAI office. Person: senior research engineer. Hey, I’m setting some parameters on our new AI, and one of those is badness of violence. How bad should I say violence is? 100? Eh whatever, better make it 500 just to be on the safe side.
Soon the AI invents nanotech, sends out brain modifying nanobots. The nanobots have simple instructions, upregulate brain region X, downregulate hormone Y. An effect not that different to some recreational drugs, but a bit more controlled, and applied to all humans. All across the world, the sections of the brain that think “get rid of the hated outgroup because …” just shut off. The AI helps this along by removing all the guns, but this isn’t the main reason things are so peaceful.
In this scenario, there is nothing totalitarian. (you can argue it’s bad for other reasons, but it sure isn’t totalitarian) and there is nothing for bad actors to exploit. It’s just everyone in the world suddenly feeling their hate melt away and deciding that the outgroup aren’t so bad after all.
I don’t think this is so strict as to basically be human extinction, Arguably there are some humans basically already in this mind space or close to it, (sure, maybe buddist hippies or something, but still humans).
Not everyone is cosmopolitan. But to make your S-risk arguments work, you either need someone who is actively sadistic in a position of power. (You can argue that Putin is actively sadistic, Zuckerberg maybe not so much) Or you need to explain why bad outcomes happen when a buisnessman who doesn’t think about ethics much gets to the AI.
By bargining process, are we talking about humans doing politics in the real world, or about the AI running a “assume all humans had equal weight at the hypothetical platonic negotiating table” algorithm. I was thinking of the latter.
Most people haven’t really considered the future nonhuman minds. If given more details and asked if they were totally fine torturing such minds they would probably say no.
How much are we assuming that the whole future is set in stone by the average humans first flinch response. And how much of a “if we were wiser and thought more” is the AI applying. (Or will the AI update it’s actions to match once we actually do think more?
- andrew sauer 28 Dec 2022 4:54 UTC
  3 points
  0
  Parent
  Re extreme pacifism:
  I do think non consensual mind modification is a pretty authoritarian measure. The MIRI guy is going to have a lot more parameters to set than just “violence bad=500”, and if the AI is willing to modify people’s minds to satisfy that value, why not do that for everything else it believes in? Bad actors can absolutely exploit this capability, if they have a hand in the development of the relevant AI, they can just mind-control people to believe in their ideology.
  Or you need to explain why bad outcomes happen when a buisinessman who doesn’t think about ethics much gets to the AI.
  Sure. Long story short, even though the businessman doesn’t care that much, other people do, and will pick up any slack left behind by the businessman or his AI.
  Some business guy who doesn’t care much about ethics but doesn’t actively hate anybody gets his values implanted into the AI. He is immediately whisked off to a volcano island with genetically engineered catgirls looking after his every whim or whatever the hell. Now the AI has to figure out what to do with the rest of the world.
  It doesn’t just kill everybody else and convert all spare matter into defenses set up around the volcano lair, because the businessman guy is chill and wouldn’t want that. He’s a libertarian and just sorta vaguely figures that everyone else can do their thing as long as it doesn’t interfere with him. The AI quickly destroys all other AI research so that nobody can challenge its power and potentially mess with its master. Now that its primary goal is done with, it has to decide what to do with everything else.
  It doesn’t just stop interfering altogether, since then AI research could recover. Plus, it figures the business guy has a weak preference for having a big human society around with cool tech and diverse, rich culture, plus lots of nice beautiful ecosystems so that he can go exploring if he ever gets tired of hanging out in his volcano lair all day.
  So the AI gives the rest of society a shit ton of advanced technology, including mind uploading and genetic engineering, and becomes largely hands-off other than making sure nobody threatens its power, destroys society, or makes something which would be discomforting to its businessman master, who doesn’t really care that much about ethics anyway. Essentially, it keeps things interesting.
  What is this new society like? It probably has pretty much every problem the old society has that doesn’t stem from limited resources or information. Maybe everybody gets a generous UBI and nobody has to work. Of course, nature is still as nasty and brutish as ever, and factory farms keep chugging along, since people have decided they don’t want to eat frankenmeat. There are still lots of psychopaths and fanatics around, both powerless and powerful. Some people decide to use the new tech to spin up simulations in VR to lord over in every awful way you can think of. Victims of crimes upload the perpetrators into hell, and religious people upload people they consider fanatics into hell, assholes do it to people they just don’t like. The businessman doesn’t care, or he doesn’t believe in sentient digital minds, or something else, and it doesn’t disrupt society. Encryption algorithms can hide all this activity, so nobody can stop it except for the AI, which doesn’t really care.
  Meanwhile, since the AI doesn’t quite care all that much about what happens, and is fine with a wide range of possible outcomes, political squabbling between all the usual factions, some of which are quite distasteful, about which outcomes should come about within this acceptable range, continues as usual. People of course debate about all the nasty stuff that people are doing with the new technology, and in the end society decides that technology in the hands of man is bad and should only be used in pursuit of goodness in the eyes of the One True God, whose identity is decided upon after extensive fighting which probably causes quite a lot of suffering itself, but is very interesting from the perspective of someone looking at it from the outside, not from too close up, like our businessman.
  The new theocrats decide they’re going to negotiate with the AI to build the most powerful system for controlling the populace that the AI will let them. The AI decides this is fine as long as they leave a small haven behind with all the old interesting stuff from the interim period. The theocrats begrudgingly agree, and now most of the minds in hell are religious dissidents, just like the One True God says it should be, and a few of the old slaves are left over in the new haven. The wilderness and the farms, of course, remain untouched. Wait a few billion years, and this shit is spread to every corner of the universe.
  Is this particular scenario likely? Of course not, it’s far too specific. I’m just using it as a more concrete example to illustrate my points. The main points are:
  - Humanity has lots of moral pitfalls, any of which will lead to disaster when universally applied and locked-in, and we are unlikely to avoid all of them
  - Not locking-in values immediately or only locking-in partially is only a temporary solution, as there will always be actors which seek to lock-in whatever is left unspecified by the current system, which cannot be prevented by definition without locking-in the values.
  By bargaining process, are we talking about humans doing politics in the real world, or about the AI running a “assume all humans had equal weight at the hypothetical platonic negotiating table” algorithm. I was thinking of the latter.
  The latter algorithm doesn’t get run unless the people who want it to be run win the real-world political battle over AI takeoff, so I was thinking of the former.
  And how much of a “if we were wiser and thought more” is the AI applying.
  I’m not sure it matters. First of all, “wiser” is somewhat of a value judgement anyway, so it can’t be used to avoid making value judgements up front. What is “wisdom” when it comes to determining your morality? It depends on what the “correct” morality is.
  And thinking more doesn’t necessarily change anything either. If somebody has an internally consistent value system where they value or don’t care about certain others, they’re not going to change that simply because they think more, any more than a paperclip maximizer will decide to make a utopia instead because it thinks more. The utility function is not up for grabs.