Kaj_Sotala comments on We’re already in AI takeoff

Kaj_Sotala Mar 10, 2022, 1:04 PM
14 points
The part about hypercreatures preventing coordination sounds very true to me, but I’m much less certain about this part:
Who is aligning the AGI? And to what is it aligning?
This isn’t just a cute philosophy problem.
A common result of egregoric stupefaction is identity fuckery. We get this image of ourselves in our minds, and then we look at that image and agree “Yep, that’s me.” Then we rearrange our minds so that all those survival instincts of the body get aimed at protecting the image in our minds.
How did you decide which bits are “you”? Or what can threaten “you”?
I’ll hop past the deluge of opinions and just tell you: It’s these superintelligences. They shaped your culture’s messages, probably shoved you through public school, gripped your parents to scar you in predictable ways, etc.
It’s like installing a memetic operating system.
If you don’t sort that out, then that OS will drive how you orient to AI alignment.
It seems to me that you can think about questions of alignment from a purely technical mindset, e.g. “what kind of a value system does the brain have and would the AI need to be like in order to understand that”, and that this kind of technical thinking is much less affected by hypercreatures than other things are. Of course it can be affected, there are plenty of instances of cases where technical questions have gotten politicized and people mind-killed as a result, but… it’s not something that happens automatically, and even when technical questions do get politicized, it often affects the non-experts much more than the experts. (E.g. climate researchers have much more of a consensus on climate change than the general public does.)
And because this allows you to reason about alignment one step removed—instead of thinking about the object-level values, you are reasoning about the system (either in the human brain or the AI) that extracts the object-level values—it may let you avoid ever triggering most of the hypercreatures sleeping in your brain.
You may even reason about the hypercreatures abstractly enough not to trigger them, and design the AI with a hypercreature-elimination system which is good enough to detect and neutralize any irrational hypercreatures. This would, of course, be a threat to the hypercreatures possessing you, but part of their nature involves telling you that they are perfectly rational and justified and there is nothing irrational about them. At the same time, an “AI removing hypercreatures” isn’t really the kind of a threat they would have evolved to recognize or try to attack, so they feel safe just telling you that of course there will be no danger and that the creation of the AI will just lead to the AI implementing the_objectively_best_ideology everywhere. So you believe them, feel unconcerned as you design your AI to detect and neutralize irrational hypercreatures, and then suddenly oh what happened, how could you ever have believed that crazy old thing before.
I don’t feel certain that it would go this way, to be clear. I could see lots of ways in which it would go differently. But it also doesn’t feel obviously implausible to imagine that it could go this way.
- jimmy Mar 11, 2022, 7:49 AM
  9 points
  0
  Parent
  It seems to me that you can think about questions of alignment from a purely technical mindset, [..] and that this kind of technical thinking is much less affected by hypercreatures than other things are.[...] And because this allows you to reason about alignment one step removed
  I agree that thinking about alignment from a purely technical mindset provides a dissociative barrier that helps to keep the hypercreatures at bay. However, I disagree with the implication that this is “all good”. When you’re “removed” like that, you don’t just cut the flow of bad influences. You cut everything which comes from a tighter connection to what you’re studying.
  If you’re a doctor treating someone close to you, this “tighter connection” might bring in emotions that overwhelm your rational thinking. Maybe you think you “have to do something” so you do the something that your rational brain knows to have been performing worse than “doing nothing” in all the scientific studies.
  Or… maybe you have yourself under control, and your intuitive knowledge of the patient gives you a feel of how vigilant they would be with physical therapy, and maybe this leads to different and better decisions than going on science alone when it comes to “PT or surgery?”. Maybe your caring leads you to look through the political biases because you care more about getting it right than you do about the social stigma of wearing masks “too early” into the pandemic.
  So if you want to be a good doctor to those you really care about, what do you do?
  In the short term, if you can’t handle your emotions, clearly you pass the job off to someone else. Or if you must, you do it yourself while “dissociated”. You “Follow accepted practice”, and view your emotions as “false temptations”. In the long term though, you want to get to the place where such inputs are assets rather than threats, and that requires working on your own inner alignment.
  In the example I gave, the some of the success of being “more connected” came from being more connected to your patient than you are to the judgment of the world at large. Maybe cutting off twitter would be a good start, since that’s where these hypercreatures live, breed, and prey on minds. I think “How active are the leading scientists on twitter?” probably correlates pretty well for how much I’d distrust the consensus within that field.
  As a thing that I fairly strongly suspect but cannot prove, maybe “remove yourself from twitter” is the scaled up equivalent of “remove yourself from your personal relationship with your patient”—something that is prudent up until the point where you are able to use the input as an asset rather than getting bowled over and controlled it. In this view, it might be better to think of the problem as *growing* alignment. You start by nurturing your own independence and setting personal boundaries so you don’t get sucked into codepentent relationships which end up mutually abusive—even in subtle ways like “doing something” counterproductive as an emotionally overwhelmed doctor. Then once external influence on this scale can’t kick you out of the driver seat, and you have things organized well enough that you can both influence and be influenced in a way that works better than independence, then you move to increasingly larger scales. In this view, it’s not necessarily “hypercreatures bad”, but “hypercreatures bigger than I currently am, and not yet tamed”.
  - instead of thinking about the object-level values, you are reasoning about the system [...] that extracts the object-level values -
  I strongly recommend doing inner alignment work this way too. A huge part of what people “care” about is founded on incomplete and/or distorted perspectives, and if you’re not examining the bedrock your values are built on, then you’re missing the most important part.
  
  You may even reason about the hypercreatures abstractly enough not to trigger them, and design the AI with a hypercreature-elimination system which is good enough to detect and neutralize any irrational hypercreatures. This would, of course, be a threat to the hypercreatures possessing you, but part of their nature involves telling you that they are perfectly rational and justified and there is nothing irrational about them.
  At the same time, an “AI removing hypercreatures” isn’t really the kind of a threat they would have evolved to recognize or try to attack, so they feel safe just telling you that of course there will be no danger and that the creation of the AI will just lead to the AI implementing the_objectively_best_ideology everywhere. So you believe them, feel unconcerned as you design your AI to detect and neutralize irrational hypercreatures, and then suddenly oh what happened, how could you ever have believed that crazy old thing before.
  I’m pretty pessimistic on the chances of that one. You’re banking on what Val is describing as “superintelligences” being dumber than you are, despite the fact that it has recruited your brain to work for its goals. You’re clearly smart enough to make the connection between “If I design an AI this way, I might not get what my hypercreature wants”, since you just stated it. That means you’re smart enough to anticipate it happening, and that’s going to activate any defenses you have. There’s no magic barrier that allows “debate partner is going to take my belief if I give them the chance” to be thought while preventing “AI partner is going to steal my belief if I give it the chance”.
  The framing of some unfriendly hypercreature taking over your brain against your values is an evocative one and not without use, but I think it runs out of usefulness here.
  From the initial quote on Gwern’s post on cults:
  ““Brainwashing”, as popularly understood, does not exist or is of almost zero effectiveness. [...] Typically, a conversion sticks because an organization provides value to its members.
  This “enslavement to hypercreatures” typically happens because the person it “taken over” perceives it to have value. Sorting out the perceptions to match reality is the hard part, and “yer brain got eaten by hypercreature!” presupposes it away. At first glance the whole “anti-epistemology” thing doesn’t seem to fit with this interpretation but we don’t actually need “taken over by hostile forces” to explain it; watch regular people debate flat earthers and you’ll see the same motivated and bad reasoning that shows that they’re actually just taking things on faith. Faith in the consensus that the world is round is actually a really good way of dealing with the problem for people who can’t reliably solve this kind of problem for themselves on the object level. So much of what we need to know we need to take “on faith”, and figuring out which hypercreatues we can trust how far is the hard problem. Follow any of them too far and too rigidly and problems start showing up; that’s what the “twelfth virtue” is about.
  Trying to hide from the hypercreatures in your mind and design an AI to rid you of them is doomed to failure, I predict. The only way that seems to me to have any chance is to not have unhealthy relationships with hypercreatures as you bootstrap your own intelligence into something smarter, so that you can propogate alignment instead of misalignment.
  - Kaj_Sotala Mar 11, 2022, 10:18 AM
    4 points
    Parent
    I’m pretty pessimistic on the chances of that one. You’re banking on what Val is describing as “superintelligences” being dumber than you are, despite the fact that it has recruited your brain to work for its goals. You’re clearly smart enough to make the connection between “If I design an AI this way, I might not get what my hypercreature wants”, since you just stated it. That means you’re smart enough to anticipate it happening, and that’s going to activate any defenses you have.
    If this was true, then any attempt to improve your rationality or reduce the impact of hypercreatures on your mind would be doomed, since they would realize what you’re doing and prevent you from doing it.
    In my model, “hypercreatures” are something like self-replicating emotional strategies for meeting specific needs, that undergo selection to evolve something like defensive strategies as they emerge. I believe Val’s model of them is similar because I got much of it from him. :)
    But there’s a sense in which the emotional strategies have to be dumber than the entire person. The continued existence of the strategies requires systematically failing to notice information that’s often already present in other parts of the person’s brain and which would contradict the underlying assumptions of the strategies (Val talks a bit about how hypercreatures rely on systematically cutting off curiosity, at 3:34 − 9:22 of this video).
    And people already do the equivalent of “doing a thing which might lead to the removal of the hypercreature”. For instance, someone may do meditation/therapy on an emotional issue, heal an emotional wound which happens to also have been the need fueling the hypercreature, and then find themselves being unexpectedly more calm and open-minded around political discussions that were previously mind-killing to them. And rather than this being something that causes the hypercreatures in their mind to make them avoid any therapy in the future, they might find this a very positive thing that encourages them to do even more therapy/meditation in the hopes of (among other things) feeling even calmer in future political discussions. (Speaking from personal experience here.)
    This “enslavement to hypercreatures” typically happens because the person it “taken over” perceives it to have value.
    I agree, in part. Hypercreatures are instantiated as emotional strategies that fulfill some kind of a need. Though “the person perceives it to have value” suggests that it’s a conscious evaluation, whereas my model is that the evaluation is a subconscious one. Which makes something like “possession” a somewhat apt (even if imperfect) description, given that the person isn’t consciously aware of the real causes of why they act or believe the way they do, and may often be quite mistaken about them.
    - jimmy Mar 13, 2022, 7:10 AM
      6 points
      Parent
      I’m in agreement with a lot of what you’re saying.
      I agree that people’s “perceptions of value”, as it pertains to what influences them, are primarily unconscious.
      I agree that “possession” can be a usefully accurate description, from the outside.
      I agree that people can do “things which might lead to the removal of the hypercreature”, like meditation/therapy, and that not only will it sometimes remove that hypercreature but also that the person will sometimes be conditioned towards rather than away from repeating such things.
      I agree that curiosity getting killed is an important part of their stability, that this means that they don’t update on information that’s available, and that this makes them dumb.
      I agree that *sometimes* people can be “smarter than their hypercreature” in that they can be aware of and reason about things about which their hypercreatures cannot due to said dumbness.
      
      I disagree about the mechanisms of these things. This leads me to prefer different framings, which make different predictions and suggest different actions.
      I think I have about three distinct points.
      1) When things work out nicely, hypercreatures don’t mount defenses, and the whole thing get conditioned towards rather than away from, it’s not so much “hypercreatures too dumb because they didn’t evolve to notice this threat”, it’s that you don’t give them the authority to stop you.
      From the inside, it feels more like “I’m not willing to [just] give up X, because I strongly feel that it’s right, but I *am* willing to do process Y knowing that I will likely feel different afterwards. I know that my beliefs/priorities/attachments/etc will likely change, and in ways that I cannot predict, but I anticipate that these changes will be good and that I won’t lose anything not worth losing. And then when you go through the process and give up on having the entirety of X, it feels like “This is super interesting because I couldn’t see it coming, but this is *better* than X in every way according to every value X was serving for me”. It will not feel like “I must do this without thinking about it too much, so that I don’t awaken the hypercreatures!” and it will not feel like “Heck yeah! I freed myself from my ideological captor by pulling a fast one it couldnt see coming! I win you lose!”
      Does your experience differ?
      2) When those defenses *do* come out, it’s because people don’t trust the process which aims to rid them of hypercreatures more than they trust the hypercreatures
      It may look super irrational when, say, Christians do all sorts of mental gymnastics when debating atheists. However, “regular people” do the same thing when debating flat earthers. A whole lot of people can’t actually figure things out on the object level and so they default to faith in society to have come to the correct consensus. This refusal to follow their own reasoning (as informed by their debate partner) when it conflicts with their faith in society is actually valid here, and leads to the correct conclusion. Similar things can hold when the Christian refuses to honestly look at the atheist arguments, knowing that they might find themselves losing their faith if they did. Maybe that faith is actually a good thing for them, or at least that losing the faith *in that way* would be bad for them. If you take a preacher’s religion from him, then what is he? From an inside perspective, it’s not so much that he’s “possessed” as it is his only way to protect his ability to keep a coherent and functioning life. It appears to be a much more mutually symbiotic relationship from the inside, even if it sometimes looks like a bad deal from the outside when you have access to a broader set of perspectives.
      The prediction here is that if you keep the focus on helping the individual and are careful enough not to do anything that seems bad in expectation from the inside (e.g. prioritizing your own perspective on what’s “true” more than they subconsciously trust your perspective on truth to be beneficial to them), you can preempt any hypercreature defenses and not have to worry about whether it’s the kind of thing it could have evolved a defense against.
      
      3) When people don’t have that trust in the process, hypercreatures will notice anything that the person notices, because the person is running on hypercreature logic.
      When you trust your hypercreatures more than your own reasoning or the influence of those attempting to influence you, you *want* to protect them to the full extent of your abilities. To the extent that you notice “I might lose my hypercreature”, this is bad and will panic you because regardless of what you tell yourself and how happy you are about depending on such things, you actually want to keep it (for now, at least). This means that if your hypercreature is threatened by certain information, *you* are threatened by that information. So you refuse to update on it, and you as a whole person are now dumber for it.
      
      Putting these together, reasoning purely in the abstract about FAI won’t save you by avoiding triggering any hypercreatures that have power over you. If they have power over you, it’s because rightly or wrongly, you (unconsciously) decided that it was in your best interest to give it to them, and you are using your whole brain to watch out for them. If you *can* act against their interests, it’s because you haven’t yet fully conceded yourself to them, and you don’t have to keep things abstract because you are able to recognize their problems and limitations, and keep them in place.
      Thinking about FAI in the abstract can still help, if it helps you find a process that you trust more than your hypercreatures, but in that case too, you can follow that process yourself rather than waiting to build the AI and press “go”.
      EDIT: and working on implementing that aligning process on yourself gives you hands on experience and allows you to test things on a smaller scale before committing to the whole thing. It’s like building a limited complexity scale model of a new helicopter type before committing to an 8 seater. To the extent that this perspective is right, trying to do it in the abstract only will make things much harder.
- Valentine Mar 11, 2022, 4:45 PM
  4 points
  Parent
  …I’m much less certain about this part…
  You might like to know: I debated erasing that part and the one that followed, thinking of you replying to it! :-D
  But I figured hey, let’s have the chat. :-)
  It seems to me that you can think about questions of alignment from a purely technical mindset…
  Yep, I know it seems that way.
  And I disagree. I think it maintains a confusion about what “alignment” is.
  However, I’m less certain of this detail than I am about the overall picture. The part that has me say “We’re already in AI takeoff.” Which is why I debated erasing all the stuff about identity and first-person. It’s a subtle point that probably deserves its own separate post, if I ever care to write it. The rest stands on its own I think.
  But! Setting that aside for a second:
  To think of “questions of alignment from a purely technical mindset”, you need to call up an image of each of:
  - the AI
  - human values
  - some process by which these connect
  But when you do this, you’re viewing them in third person. You have to call these images (visual or not) in your mind, and then you’re looking at them.
  What the hell is this “human values” thing that’s separable from the “you” who’s looking?
  The illusion that this is possible creates a gap that summons Goodhart. The distance between your subjective first-person experience and whatever concept of “human values” you see in third person is precisely what summons horror.
  That’s the same gap that unFriendly egregores use to stupefy minds.
  You can’t get around this by taking “subjectivity” or “consciousness” or whatever as yet another object that “humans care about”.
  The only way I see to get around this is to recognize in immediate direct experience how your subjectivity — not as a concept, but as a direct experience — is in fact deeply inextricable from what you care about.
  And that this is the foundation for all care.
  When you tune a mind to correctly reflect this, you aren’t asking how this external AI aligns with “human values”. You’re asking how it synchs up with your subjective experience.
  (Here minds get super squirrely. It’s way, way too easy to see “subjective experience” in third person.)
  As you solve that, it becomes super transparent that sorting out that question is actually the same damn thing as asking how to be free of stupefaction, and how to be in clear and authentic connection with other human beings.
  So, no. I don’t think you can solve this with a purely technical mindset. I think that perpetuates exactly the problem that this mindset would be trying to solve.
  …but I could be wrong. I’m quite open to that.
  So you believe them, feel unconcerned as you design your AI to detect and neutralize irrational hypercreatures, and then suddenly oh what happened, how could you ever have believed that crazy old thing before.
  This part made me chuckle. :-)
  I do think this is roughly how it works. It’s just that it happens via memetics first.
  But overall, I agree in principle. If I’m wrong and it’s possible to orient to AI alignment as a purely technical problem, then yes, it’s possible to sidestep hypercreature hostility by kind of hitting them in an evolutionary blindspot.
  What links here?
  - Valentine's comment on We’re already in AI takeoff by Valentine (Mar 11, 2022, 5:52 PM; -4 points)
  - TekhneMakre Mar 11, 2022, 7:15 PM
    2 points
    Parent
    When you tune a mind to correctly reflect this, you aren’t asking how this external AI aligns with “human values”. You’re asking how it synchs up with your subjective experience.
    Any further detail you’d like to give on what constitutes “synching up with your subjective experience” (in the sense relevant to making an intelligence that produces plans that transform the world, without killing everyone)? :)
    - Valentine Mar 11, 2022, 9:31 PM
      1 point
      Parent
      Not at the moment. I might at some other time.
      This is a koan-type meditative puzzle FWIW. A hint:
      When you look outside and see a beautiful sky, you can admire it and think “Wow, that’s a beautiful sky.” But the knowing had to happen before the thought. What do you see when you attend to the level of knowing that comes before all thought?
      That’s not a question to answer. It’s an invitation to look.
      Not meaning to be obtuse here. This is the most direct I know how to be right now.
      - TekhneMakre Mar 11, 2022, 11:05 PM
        2 points
        Parent
        Ok thanks.