Eli Tyre comments on Towards more cooperative AI safety strategies

Eli Tyre 21 Jul 2024 4:12 UTC
8 points
0
When you read the Sequences, was your visualization of a Friendly AI going to let the governments of North Korea or Saudi Arabia persist? Would it allow parents to abuse their children in ways that are currently allowed by the law (and indeed enshrined by the law, in that the law give parents authority over their children)? Does it allow the factory farms to continue to run? How about the (then contemporaneous) US occupations of Iraq and Afghanistan?

(This is a non- rehtorical question. I wonder if we were visualizing different things.)
- Eli Tyre 21 Jul 2024 4:14 UTC
  4 points
  −4
  Parent
  Speaking for myself, I would say:
  
  It’s a superintelligence, and so it can probably figure out effective peaceful ways to accomplish it’s goals. But among it’s goals will be the dismantling of many and likely all of the world’s major governments, not to mention a bunch of other existing power structures. A government being dismantled by a superhuman persuader is, in many but not all ways, as unsettling as it being destroyed by military force.
  
  Perhaps humanity as a whole, and every individual human, would be made better off by a CEV-aligned friendly singleton, but I think the US government, as an entity, would be rightly threatened.
  - Said Achmiz 21 Jul 2024 4:38 UTC
    2 points
    0
    Parent
    Doesn’t this very answer show that an AI such as you describe would not be reasonably describable as “Friendly”, and that consequently any AI worthy of the term “Friendly” would not do any of the things you describe? (This is certainly my answer to your question!)
    - Eli Tyre 21 Jul 2024 4:41 UTC
      2 points
      0
      Parent
      No. “Friendly” was a semi-technical term of art, at the time. It may turn out that a Friendly AI (in the technical sense) is not or, even can’t be, “friendly” in a more conventional sense.
      - Said Achmiz 21 Jul 2024 6:21 UTC
        4 points
        0
        Parent
        Er… yes, I am indeed familiar with that usage of the term “Friendly”. (I’ve been reading Less Wrong since before it was Less Wrong, you know; I read the Sequences as they were being posted.) My comment was intended precisely to invoke that “semi-technical term of art”; I was not referring to “friendliness” in the colloquial sense. (That is, in fact, why I used the capitalized term.)
        
        Please consider the grandparent comment in light of the above.
        Eli Tyre 21 Jul 2024 6:42 UTC
        2 points
        −2
        Parent
        In that case, I answer flatly “no”. I don’t expect many existing governmental institutions to be ethical or legitimate in the eyes of CEV, if CEV converges at all. Factory Farming is right out.
        Said Achmiz 21 Jul 2024 6:45 UTC
        1 point
        0
        Parent
        You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
        Eli Tyre 22 Jul 2024 18:25 UTC
        3 points
        0
        Parent
        Whether most existing humans would be opposed is not a criterion of Friendliness.
        
        I think if you described what was going to happen many and maybe humans would say they prefer the status quo to a positive CEV-directed singularity. Perhaps it depends on which parts of “what’s going to happen” you focus on, some are more obviously good or exciting than others. Curing cancer is socially regarded as 👍 while curing death and dismantling governments are typically (though not universally) regarded as 👎.
        
        I don’t think they will actually provide much opposition, because a superhuman persuader will be steering the trajectory of events. (Ostensively, by using only truth tracking arguments and inputs that allow us to converge on the states of belief that we would reflectively prefer, but we mere humans won’t be able to distinguish that from malicious superhuman manipulation.)
        
        But again, how humans would react is neither here nor there for what a Friendly AI does. The AI does what the CEV of humans would want, not what the humans want.
        Said Achmiz 22 Jul 2024 20:04 UTC
        4 points
        0
        Parent
        And… you claim that the CEV of existing humans will want those things?
        Raemon 22 Jul 2024 20:38 UTC
        9 points
        10
        Parent
        Part of the whole point of CEV is to discover at least some things that current humanity is confused about but would want if fully informed, with time to think. It’d be surprising to me if CEV-existing-humanity didn’t turn out to want some things that many current humans are opposed to.
        Said Achmiz 22 Jul 2024 22:45 UTC
        6 points
        1
        Parent
        Sure. Now, as far as I understand it, whether the extrapolated volition of humanity will even cohere is an open question (on any given extrapolation method; we set aside the technical question of selecting or constructing such a method).
        
        So Eli Tyre’s claim seems to be something like: on [ all relevant / the most likely / otherwise appropriately selected ] extrapolation methods, (a) humanity’s EV will cohere, (b) it will turn out to endorse the specific things described (dismantling of all governments, removing the supply of factory farmed meat, dictating how people should raise their children).
        
        Right?
        Expand this thread
        Eli Tyre 23 Jul 2024 2:06 UTC
        6 points
        0
        Parent
        I’m much more doubtful than most people around here about whether CEV coheres: I guess that the CEV of some humans wireheads themselves and the CEV of other humans doesn’t, for instance.
        
        But I’m bracketing that concern for this discussion. Assuming CEV coheres, then yes I predict that it will have radical (in the sense of a political radical who’s beliefs are extremely outside of the overton window, such that they are disturbing to the median voter) views about all of those things.
        
        But more confidently, I predict that it will have radical views about a very long list of things that are commonplace in 2024, even if it turns out that I’m wrong about this specific set.
        
        CEV asks what would we want if we knew everything the AI knows. There are dozens of things that I think that I know, that if the average person knew to be true, would invalidate a lot of their ideology. Basic
        If the average person knew everything that an AGI knows (which includes potentially millions of subjective years of human science, whole new fields, as foundational to one’s worldview as economics and probability theory is to my current worldview), and they had hundreds of subjective years to internalize those facts and domains, in a social context that was conducive to that, with (potentially) large increases in their intelligence, I expect their views are basically unrecognizable after a process like that.
        
        As a case in point, most people consider it catastrophically bad to have their body destroyed (duh). And if you asked them if they would prefer, given their body being destroyed, to have their brain-state recorded, uploaded, and run on a computer, many would say “no”, because it seems horrifying to them.
        
        Most LessWrongers embrace computationalism: they think that living as an upload is about as good as living as a squishy biological robot (and indeed, better in many respects). They would of course choose to be uploaded if their body was being destroyed. Many would elect to have their body destroyed specifically because they would prefer to be uploaded!
        
        That is most LessWrongers think they know something which most people don’t know, but which, if they did know it, would radically alter their preferences and behavior.
        
        I think a mature AGI knows at least thousands of things like that.
        So among the things about CEV that I’m most confident about (again, granting that it coheres at all), is that CEV has extremely radical views, conclusions which are horrifying to most people, including probably myself.
        quila 23 Jul 2024 13:59 UTC
        5 points
        8
        Parent
        If by ‘cohere’ you mean ‘the CEVs of all individual humans match’, then my belief (>99%) is that it is not the case that the CEVs of all individual humans will (precisely) match. I also believe there would be significant overlap between the CEVs of 90+% of humans^[1], and that this overlap would include disvaluing two of the three^[2] things you asked about (present factory farming and child abuse; more generally, animal and child suffering).
        (This felt mostly obvious to me, but you did ask about it a few times, in a way that suggested you expect something different; if so, you’re welcome to pinpoint where you disagree.)
        ^
        For instance, even if one human wants to create a lot of hedonium, and another human wants to create a lot of individuals living fun and interesting lives, it will remain the case that they both disvalue things like extreme suffering. Also, the former human will probably still find at least some value in what the latter human seeks.
        ^
        For the part of your question about whether their CEVs would endorse dismantling governments: note that ‘governments’ is a relevantly broad category, when considering that most configurations which are infeasible now will be feasible in the (superintelligence-governed) future. I think these statements capture most of my belief about how most humans’ CEVs would regard things in this broad category.
        Most human CEVs would be permissive of those who terminally-wish^[3] to live in contexts that have some form of harmless government structure.
        The category of ‘government’ also includes, e.g., dystopias that create suffering minds and don’t let them leave; most human CEVs would seek to prevent this kind of government from existing.
        (None of that implies any government would be present everywhere, nor that anyone would be in such a context against their will; rather, I’m imagining that a great diversity of contexts and minds will exist. I less confidently predict that most will choose to live in contexts without a government structure, considering it unnecessary given the presence of a benevolent ASI.)
        ^
        (wished for not because it is necessary, for it would not be under a benevolent ASI, but simply because it’s their vision for the context in which they want to live)
        Eli Tyre 23 Jul 2024 1:35 UTC
        2 points
        0
        Parent
        I do.
        
        I mean, it depends on the exact CEV procedure. But yes.
        quetzal_rainbow 21 Jul 2024 7:55 UTC
        2 points
        −6
        Parent
        I think majority of nations would support dismantling their governments in favor of benevolent superintelligence, especially given correct framework. And ASI can simply solve problem of meat by growing brainless bodies.
        quila 23 Jul 2024 13:15 UTC
        1 point
        0
        Parent
        Edit: Whoever mega-downvoted this, I’m interested to see you explain why.
        Meta: You may wish to know^[1] that seeing these terms replaced with the ones you used can induce stress/dissociation in the relevant groups (people disturbed by factory farming and child abuse survivors). I am both and this was my experience. I don’t know how common it would be among LW readers of those demographics specifically, though.
        The one you responded to:
        Would it allow parents to abuse their children in ways that are currently allowed by the law [...]? Does it allow the factory farms to continue to run?
        Your response:
        You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
        ^
        I’m framing this as sharing info you (or a generalized altruistic person placed in your position) may care about rather than as arguing for a further conclusion.
- Ruby 21 Jul 2024 4:29 UTC
  3 points
  0
  Parent
  I’m confused about your question. I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
  - Eli Tyre 21 Jul 2024 4:47 UTC
    2 points
    0
    Parent
    I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
    Right, but I’m asking about what you’re visualization of a Friendly AI as described in the sequences, not a limited AGI for a pivotal act.
    
    I’m confused by your confusion! Are you saying that that’s a non-sequitur, because I’m asking about a CEV-sovereign instead of a corrigible, limited genie or oracle?
    
    It seems relevant to me, because both of those were strategic goals for MIRI at various points in it’s history, and at least one of them seem well characterized as “taking over the world” (or at least something very nearby to that). Which seems germane to the discussion at hand to me.
    - Ruby 21 Jul 2024 5:05 UTC
      2 points
      0
      Parent
      I would be surprised if a Friendly AI resulted in those things being left untouched.
      
      I think that is germane but maybe needed some bridging/connecting work since this thread so far was about MIRI-as-having-pivotal-act-goal. Whereas I was less sure about whether MIRI itself would enact a pivotal act if they could than Habryka, my understanding was they had no plan to create a sovereign for most of their history (like after 2004) and so doesn’t seem like that’s a candidate for them having a plan to take over the world.
      - Eli Tyre 21 Jul 2024 5:19 UTC
        3 points
        1
        Parent
        my understanding was they had no plan to create a sovereign for most of their history (like after 2004)
        Yeah, I think that’s false.
        
        The plan was “Figure out how to build a friendly AI, and then build one”. (As Eliezer stated in the video that I linked somewhere else in this comment thread).
        
        But also, I got that impression from the Sequences? Like Eliezer talks about actually building an AGI, not just figuring out the theory of how to build one. You didn’t get that impression?
        Ruby 21 Jul 2024 5:33 UTC
        4 points
        2
        Parent
        I don’t remember what exactly I thought in 2012 when I was reading the Sequences. I do recall sometime later, after DL was in full swing, it seeming like MIRI wasn’t in any position to be building AGI before others (like no compute, not the engineering prowess), and someone (not necessarily at MIRI) confirmed that wasn’t the plan. Now and at the time, I don’t know how much that was principle vs ability.
        Ruby 21 Jul 2024 5:42 UTC
        2 points
        0
        Parent
        My feeling of the plan pre-pivotal-act era was “figure out the theory of how to build a safe AI at all, and try to get whoever is building to adopt that approach”, and that MIRI wasn’t taking any steps to be the ones building it. I also had the model that due to psychological unity of mankind, anyone building an aligned[ with them] AGI was a good outcome compared to someone building unaligned. Like even if it was Xi Jinping, a sovereign aligned with him would be okay (and not obviously that dramatically different from anyone else?). I’m not sure how much this was MIRI positions vs fragments that I combined in my own head that came from assorted places and were never policy.
        Eli Tyre 21 Jul 2024 18:39 UTC
        3 points
        0
        Parent
        Well, I can tell you that they definitely planned to build the Friendly AI, after figuring out how.
        
        See this other comment.
        Ruby 21 Jul 2024 19:17 UTC
        2 points
        0
        Parent
        Pretty solid evidence.