Donald Hobson comments on The case against AI alignment

Donald Hobson 27 Dec 2022 16:21 UTC
6 points
1
So, let’s say a solution to alignment is found. It is highly technical. Most of Miri understand it, as do a few people at OpenAI, and a handful of people doing PhD’s in the appropriate subfield. If you pick a random bunch of nerds from an AI conference, chances are that none of them are evil. I don’t have an “evil outgroup I really hate”, and neither do you from the sound of it. It is still tricky, and will need a bunch of people working together. Sure, evil people exist, but they aren’t working to align AI to their evil ends, like at all. Thinking deeply about metaethics and being evil seem opposed to each other. There are no effective sadists trying to make a suffering maximizing AI.
So the question is, how likely is it that the likes of Putin ends up with their utility function in the AI, despite not understanding the first thing about how the AI works. I would say pretty tiny. They live in basically a parallel universe.
- andrew sauer 27 Dec 2022 19:05 UTC
  5 points
  1
  Parent
  First of all, I don’t trust MIRI nerds nor myself with this kind of absolute power. We may not be as susceptible to the ‘hated outgroup’ pitfall but that’s not the only pitfall. For one thing, presumably we’d want to include other people’s values in the equation to avoid being a tyrant, and you’d have to decide exactly when those values are too evil to include. Err on either side of that line and you get awful results. You have to decide exactly which beings you consider sentient, in a high-tech universe. Any mistakes there will result in a horrific future, since there will be at least some sadists actively trying to circumvent your definition of sentience, exploiting the freedom you give them to live as they see fit, which you must give them to avoid a dystopia. The problem of value choice in a world with such extreme potential is not something I trust anybody with, noble as they may be compared to the average person on today’s Earth.
  Second, I’m not sure about the scenario you describe where AI is developed by a handful of MIRI nerds without anybody else in broader society or government noticing the potential of the technology and acting to insert their values into it before takeoff. It’s not like the rationalist community are the only people in the world who are concerned about the potential of AI tech. Especially since AI capabilities will continue to improve and show their potential as we get closer to the critical point. As for powerful people like Putin, they may not understand how AI works, but people in their orbit eventually will, and the smarter ones will listen, and use their immense resources to act on it. Besides, people like Putin only exist because there is at least some contingent of people who support him. If AI values are decided upon by some complex social bargaining process including all the powers that be, which seems likely, the values of those people will be represented, and even representing evil values can lead to horrific consequences down the line.
  - Donald Hobson 27 Dec 2022 20:16 UTC
    7 points
    5
    Parent
    I am not totally on board with the “any slight mistake leads to a doom worse than paperclips”.
    Any mistakes there will result in a horrific future, since there will be at least some sadists actively trying to circumvent your definition of sentience, exploiting the freedom you give them to live as they see fit, which you must give them to avoid a dystopia.
    Suppose we have a wide swath of “I don’t know, maybe kind of sentient”. Let’s say we kill all the sadists. (Maybe not the best decision, I would prefer to modify their mind so they are no longer sadist, but at least we should be able to agree that killing some people is better than killing all people. ) We don’t let any more sadists be created. Lets say we go too far on this axis. We get some world full of total pacifists. The typical modern cartoon or computer game would be utterly horrifying to all of them. The utterly pacifist humans have tea parties and grow flowers and draw pretty pictures of flowers and do maths and philosophy and make technology. All while being massively excessive in avoiding anything that resembles violence or any story featuring it. Has this universe lost something of significant value. Yes. But it is still way better than paperclips.
    I think the likes of Putin are surrounded by Yes men who haven’t even told him that his special military operation isn’t going well.
    One thing all governments seem good at doing is nothing, or some symbolic action that does little.
    Putin is putting most of his attention into Ukraine. Not ChatGPT. Sure, ChatGPT is a fair distance from AGI (probably) but it all looks much the same from Putin’s epistemic vantage point. The inferential distance from Putin to anything needed to have a clue about AI (beyond a hypemeter, counting the amount of hype is trivial) is large. Putins resources are large, but they are resources tied to the russian deep state. Suppose there were some exciting papers, and stuff was happening in bay area rationalist circles. Putin doesn’t have his spies in bay area rationalist circles. He doesn’t even have any agent that knows the jargon. He isn’t on the mailing list. He could probably assassinate someone, but he would have little idea who he had assassinated, or if their death made any difference. What is he going to do, send goons with guns to break into some research place and go “align AI to our boss, or else”. That will just end up with some sort of classic hostage or murder scenario.
    I mean partly I expect things to move fast. If it would take Putin 5 years to position his resources, too late. I expect the sort of people on the forefront of AI to not be suddenly overtaken by evil people.
    Putin can’t hire a top AI expert. Partly because, in the last few years, the top AI experts will be flooded with job offers by people who aren’t evil. And partly because he will get some fast talking suit.
    I think the “law of continued failure” applies here. They largely ignore AI, and when they think about it, they think nonsense. And they will continue doing that.
    If we do have some complex bargaining process, there are some people who want bad things, but generally more people who want good things. Sure, occasionally person A wants person B dead. But person B doesn’t want to be dead. And you don’t want person B dead either. So 2 against 1, person B lives.
    - andrew sauer 28 Dec 2022 0:33 UTC
      2 points
      0
      Parent
      I’ll have to think more about your “extremely pacifist” example. My intuition says that something like this is very unlikely, as the amount of killing, indoctrination, and general societal change required to get there would seem far worse to almost anybody in the current world than the more abstract concept of suffering subroutines or exploiting uploads or designer minds or something like that. It seems like in order to achieve a society like you describe there would have to be some seriously totalitarian behavior, and while it may be justified to avoid the nightmare scenarios, that comes with its own serious and historically attested risk of corruption. It seems like any attempt at this would either leave some serious bad tendencies behind, be co-opted into a “get rid of the hated outgroup because they’re the real sadists” deal by bad actors, or be so strict that it’s basically human extinction anyway, leaving humans unrecognizable, and it doesn’t seem likely that society will go for this route even if it would work. But that’s the part of my argument I’m probably least confident in at the moment.
      I think Putin is kind of a weak man here. There are other actors which are competent, if not from the top-down, than at least some segments of the people near to power in many of the powers that be are somewhat competent. Some level of competence is required to even remain in power. I think it’s likely that Putin is more incompetent than the average head of state, and he will fall from power at some point before things really start heating up with AI, probably due to the current fiasco. But whether or not that happens doesn’t really matter, because I’m focused more generally on somewhat competent actors which will exist around the time of takeoff, not individual imbeciles like Putin. People like him are not the root of the rot, but a symptom.
      Or perhaps corporate actors are a better example than state actors, being able to act faster to take advantage of trends. This is why the people offering AI people jobs may not be so non-evil after all. If the world 50 years from now is owned by some currently unknown enterprising psychopathic CEO, or by the likes of Zuckerberg, that’s not really much better than any of the current powers that be. I apologize for being too focused on tyrannical governments, it was simply because you provided the initial example of Putin. He’s not the only type of evil person in this world, there are others who are more competent and better equipped to take advantage of looming AI takeoff.
      Also the whole “break into some research place with guns and demand they do your research for you” example is silly, that’s not how power operates. People with that much power would set up and operate their own research organizations and systems for ensuring those orgs do what the boss wants. Large companies in the tech sector would be particularly well-equipped to do this, and I don’t think their leaders are the type of cosmopolitan that MIRI types are. Very few people are outside of the rationalist community itself in fact, and I think you’re putting too much stock in the idea that the rationalist community will be the only ones to have any say in AI, even aside from issues of trusting them.
      As for the bargaining process, how confident are you that more people want good things than bad things as relates to the far future? For one thing, the bargaining process is not guaranteed to be fair, and almost certainly won’t be. It will greatly favor people with influence over those without, just like every other social bargaining process. There could be minority groups, or groups that get minority power in the bargain, who others generally hate. There are certainly large political movements going in this direction as we speak. And most people don’t care at all about animals, or whatever other kinds of nonhuman consciousness which may be created in the future, and it’s very doubtful any such entities will get any say at all in whatever bargaining process takes place.
      - Donald Hobson 28 Dec 2022 1:14 UTC
        5 points
        3
        Parent
        Your criticisms of my extreme pacifism example aren’t what I was thinking at all. I was more thinking.
        Scene: 3 days pre singularity. Place: OpenAI office. Person: senior research engineer. Hey, I’m setting some parameters on our new AI, and one of those is badness of violence. How bad should I say violence is? 100? Eh whatever, better make it 500 just to be on the safe side.
        Soon the AI invents nanotech, sends out brain modifying nanobots. The nanobots have simple instructions, upregulate brain region X, downregulate hormone Y. An effect not that different to some recreational drugs, but a bit more controlled, and applied to all humans. All across the world, the sections of the brain that think “get rid of the hated outgroup because …” just shut off. The AI helps this along by removing all the guns, but this isn’t the main reason things are so peaceful.
        In this scenario, there is nothing totalitarian. (you can argue it’s bad for other reasons, but it sure isn’t totalitarian) and there is nothing for bad actors to exploit. It’s just everyone in the world suddenly feeling their hate melt away and deciding that the outgroup aren’t so bad after all.
        I don’t think this is so strict as to basically be human extinction, Arguably there are some humans basically already in this mind space or close to it, (sure, maybe buddist hippies or something, but still humans).
        Not everyone is cosmopolitan. But to make your S-risk arguments work, you either need someone who is actively sadistic in a position of power. (You can argue that Putin is actively sadistic, Zuckerberg maybe not so much) Or you need to explain why bad outcomes happen when a buisnessman who doesn’t think about ethics much gets to the AI.
        By bargining process, are we talking about humans doing politics in the real world, or about the AI running a “assume all humans had equal weight at the hypothetical platonic negotiating table” algorithm. I was thinking of the latter.
        Most people haven’t really considered the future nonhuman minds. If given more details and asked if they were totally fine torturing such minds they would probably say no.
        How much are we assuming that the whole future is set in stone by the average humans first flinch response. And how much of a “if we were wiser and thought more” is the AI applying. (Or will the AI update it’s actions to match once we actually do think more?
        andrew sauer 28 Dec 2022 4:54 UTC
        3 points
        0
        Parent
        Re extreme pacifism:
        I do think non consensual mind modification is a pretty authoritarian measure. The MIRI guy is going to have a lot more parameters to set than just “violence bad=500”, and if the AI is willing to modify people’s minds to satisfy that value, why not do that for everything else it believes in? Bad actors can absolutely exploit this capability, if they have a hand in the development of the relevant AI, they can just mind-control people to believe in their ideology.
        Or you need to explain why bad outcomes happen when a buisinessman who doesn’t think about ethics much gets to the AI.
        Sure. Long story short, even though the businessman doesn’t care that much, other people do, and will pick up any slack left behind by the businessman or his AI.
        Some business guy who doesn’t care much about ethics but doesn’t actively hate anybody gets his values implanted into the AI. He is immediately whisked off to a volcano island with genetically engineered catgirls looking after his every whim or whatever the hell. Now the AI has to figure out what to do with the rest of the world.
        It doesn’t just kill everybody else and convert all spare matter into defenses set up around the volcano lair, because the businessman guy is chill and wouldn’t want that. He’s a libertarian and just sorta vaguely figures that everyone else can do their thing as long as it doesn’t interfere with him. The AI quickly destroys all other AI research so that nobody can challenge its power and potentially mess with its master. Now that its primary goal is done with, it has to decide what to do with everything else.
        It doesn’t just stop interfering altogether, since then AI research could recover. Plus, it figures the business guy has a weak preference for having a big human society around with cool tech and diverse, rich culture, plus lots of nice beautiful ecosystems so that he can go exploring if he ever gets tired of hanging out in his volcano lair all day.
        So the AI gives the rest of society a shit ton of advanced technology, including mind uploading and genetic engineering, and becomes largely hands-off other than making sure nobody threatens its power, destroys society, or makes something which would be discomforting to its businessman master, who doesn’t really care that much about ethics anyway. Essentially, it keeps things interesting.
        What is this new society like? It probably has pretty much every problem the old society has that doesn’t stem from limited resources or information. Maybe everybody gets a generous UBI and nobody has to work. Of course, nature is still as nasty and brutish as ever, and factory farms keep chugging along, since people have decided they don’t want to eat frankenmeat. There are still lots of psychopaths and fanatics around, both powerless and powerful. Some people decide to use the new tech to spin up simulations in VR to lord over in every awful way you can think of. Victims of crimes upload the perpetrators into hell, and religious people upload people they consider fanatics into hell, assholes do it to people they just don’t like. The businessman doesn’t care, or he doesn’t believe in sentient digital minds, or something else, and it doesn’t disrupt society. Encryption algorithms can hide all this activity, so nobody can stop it except for the AI, which doesn’t really care.
        Meanwhile, since the AI doesn’t quite care all that much about what happens, and is fine with a wide range of possible outcomes, political squabbling between all the usual factions, some of which are quite distasteful, about which outcomes should come about within this acceptable range, continues as usual. People of course debate about all the nasty stuff that people are doing with the new technology, and in the end society decides that technology in the hands of man is bad and should only be used in pursuit of goodness in the eyes of the One True God, whose identity is decided upon after extensive fighting which probably causes quite a lot of suffering itself, but is very interesting from the perspective of someone looking at it from the outside, not from too close up, like our businessman.
        The new theocrats decide they’re going to negotiate with the AI to build the most powerful system for controlling the populace that the AI will let them. The AI decides this is fine as long as they leave a small haven behind with all the old interesting stuff from the interim period. The theocrats begrudgingly agree, and now most of the minds in hell are religious dissidents, just like the One True God says it should be, and a few of the old slaves are left over in the new haven. The wilderness and the farms, of course, remain untouched. Wait a few billion years, and this shit is spread to every corner of the universe.
        Is this particular scenario likely? Of course not, it’s far too specific. I’m just using it as a more concrete example to illustrate my points. The main points are:
        Humanity has lots of moral pitfalls, any of which will lead to disaster when universally applied and locked-in, and we are unlikely to avoid all of them
        Not locking-in values immediately or only locking-in partially is only a temporary solution, as there will always be actors which seek to lock-in whatever is left unspecified by the current system, which cannot be prevented by definition without locking-in the values.
        By bargaining process, are we talking about humans doing politics in the real world, or about the AI running a “assume all humans had equal weight at the hypothetical platonic negotiating table” algorithm. I was thinking of the latter.
        The latter algorithm doesn’t get run unless the people who want it to be run win the real-world political battle over AI takeoff, so I was thinking of the former.
        And how much of a “if we were wiser and thought more” is the AI applying.
        I’m not sure it matters. First of all, “wiser” is somewhat of a value judgement anyway, so it can’t be used to avoid making value judgements up front. What is “wisdom” when it comes to determining your morality? It depends on what the “correct” morality is.
        And thinking more doesn’t necessarily change anything either. If somebody has an internally consistent value system where they value or don’t care about certain others, they’re not going to change that simply because they think more, any more than a paperclip maximizer will decide to make a utopia instead because it thinks more. The utility function is not up for grabs.