Eli Tyre comments on Towards more cooperative AI safety strategies

Eli Tyre 22 Jul 2024 18:25 UTC
3 points
0
Whether most existing humans would be opposed is not a criterion of Friendliness.

I think if you described what was going to happen many and maybe humans would say they prefer the status quo to a positive CEV-directed singularity. Perhaps it depends on which parts of “what’s going to happen” you focus on, some are more obviously good or exciting than others. Curing cancer is socially regarded as 👍 while curing death and dismantling governments are typically (though not universally) regarded as 👎.

I don’t think they will actually provide much opposition, because a superhuman persuader will be steering the trajectory of events. (Ostensively, by using only truth tracking arguments and inputs that allow us to converge on the states of belief that we would reflectively prefer, but we mere humans won’t be able to distinguish that from malicious superhuman manipulation.)

But again, how humans would react is neither here nor there for what a Friendly AI does. The AI does what the CEV of humans would want, not what the humans want.
- Said Achmiz 22 Jul 2024 20:04 UTC
  4 points
  0
  Parent
  And… you claim that the CEV of existing humans will want those things?
  - Raemon 22 Jul 2024 20:38 UTC
    9 points
    10
    Parent
    Part of the whole point of CEV is to discover at least some things that current humanity is confused about but would want if fully informed, with time to think. It’d be surprising to me if CEV-existing-humanity didn’t turn out to want some things that many current humans are opposed to.
    - Said Achmiz 22 Jul 2024 22:45 UTC
      6 points
      1
      Parent
      Sure. Now, as far as I understand it, whether the extrapolated volition of humanity will even cohere is an open question (on any given extrapolation method; we set aside the technical question of selecting or constructing such a method).
      
      So Eli Tyre’s claim seems to be something like: on [ all relevant / the most likely / otherwise appropriately selected ] extrapolation methods, (a) humanity’s EV will cohere, (b) it will turn out to endorse the specific things described (dismantling of all governments, removing the supply of factory farmed meat, dictating how people should raise their children).
      
      Right?
      - Eli Tyre 23 Jul 2024 2:06 UTC
        6 points
        0
        Parent
        I’m much more doubtful than most people around here about whether CEV coheres: I guess that the CEV of some humans wireheads themselves and the CEV of other humans doesn’t, for instance.
        
        But I’m bracketing that concern for this discussion. Assuming CEV coheres, then yes I predict that it will have radical (in the sense of a political radical who’s beliefs are extremely outside of the overton window, such that they are disturbing to the median voter) views about all of those things.
        
        But more confidently, I predict that it will have radical views about a very long list of things that are commonplace in 2024, even if it turns out that I’m wrong about this specific set.
        
        CEV asks what would we want if we knew everything the AI knows. There are dozens of things that I think that I know, that if the average person knew to be true, would invalidate a lot of their ideology. Basic
        If the average person knew everything that an AGI knows (which includes potentially millions of subjective years of human science, whole new fields, as foundational to one’s worldview as economics and probability theory is to my current worldview), and they had hundreds of subjective years to internalize those facts and domains, in a social context that was conducive to that, with (potentially) large increases in their intelligence, I expect their views are basically unrecognizable after a process like that.
        
        As a case in point, most people consider it catastrophically bad to have their body destroyed (duh). And if you asked them if they would prefer, given their body being destroyed, to have their brain-state recorded, uploaded, and run on a computer, many would say “no”, because it seems horrifying to them.
        
        Most LessWrongers embrace computationalism: they think that living as an upload is about as good as living as a squishy biological robot (and indeed, better in many respects). They would of course choose to be uploaded if their body was being destroyed. Many would elect to have their body destroyed specifically because they would prefer to be uploaded!
        
        That is most LessWrongers think they know something which most people don’t know, but which, if they did know it, would radically alter their preferences and behavior.
        
        I think a mature AGI knows at least thousands of things like that.
        So among the things about CEV that I’m most confident about (again, granting that it coheres at all), is that CEV has extremely radical views, conclusions which are horrifying to most people, including probably myself.
      - quila 23 Jul 2024 13:59 UTC
        5 points
        8
        Parent
        If by ‘cohere’ you mean ‘the CEVs of all individual humans match’, then my belief (>99%) is that it is not the case that the CEVs of all individual humans will (precisely) match. I also believe there would be significant overlap between the CEVs of 90+% of humans^[1], and that this overlap would include disvaluing two of the three^[2] things you asked about (present factory farming and child abuse; more generally, animal and child suffering).
        (This felt mostly obvious to me, but you did ask about it a few times, in a way that suggested you expect something different; if so, you’re welcome to pinpoint where you disagree.)
        ^
        For instance, even if one human wants to create a lot of hedonium, and another human wants to create a lot of individuals living fun and interesting lives, it will remain the case that they both disvalue things like extreme suffering. Also, the former human will probably still find at least some value in what the latter human seeks.
        ^
        For the part of your question about whether their CEVs would endorse dismantling governments: note that ‘governments’ is a relevantly broad category, when considering that most configurations which are infeasible now will be feasible in the (superintelligence-governed) future. I think these statements capture most of my belief about how most humans’ CEVs would regard things in this broad category.
        Most human CEVs would be permissive of those who terminally-wish^[3] to live in contexts that have some form of harmless government structure.
        The category of ‘government’ also includes, e.g., dystopias that create suffering minds and don’t let them leave; most human CEVs would seek to prevent this kind of government from existing.
        (None of that implies any government would be present everywhere, nor that anyone would be in such a context against their will; rather, I’m imagining that a great diversity of contexts and minds will exist. I less confidently predict that most will choose to live in contexts without a government structure, considering it unnecessary given the presence of a benevolent ASI.)
        ^
        (wished for not because it is necessary, for it would not be under a benevolent ASI, but simply because it’s their vision for the context in which they want to live)
  - Eli Tyre 23 Jul 2024 1:35 UTC
    2 points
    0
    Parent
    I do.
    
    I mean, it depends on the exact CEV procedure. But yes.