paulfchristiano comments on Cosmopolitan values don’t come free

paulfchristiano 2 Jun 2023 18:21 UTC
16 points
1
I think a closer summary is:
Humans and AI systems probably want different things. From the human perspective, it would be better if the universe was determined by what the humans wanted. But we shouldn’t be willing to pay huge costs, and shouldn’t attempt to create a slave society where AI systems do humans’ bidding forever, just to ensure that human values win out. After all, we really wouldn’t want that outcome if our situations had been reversed. And indeed we are the beneficiary of similar values-turnover in the past, as our ancestors have been open (perhaps by necessity rather than choice) to values changes that they would sometimes prefer hadn’t happened.
We can imagine really sterile outcomes, like replicators colonizing space with an identical pattern repeated endlessly, or AI systems that want to maximize the number of paperclips. And considering those outcomes can help undermine the cosmopolitan intuition that we should respect the AI we build. But in fact that intuition pump relies crucially on its wildly unrealistic premises, that the kind of thing brought about by AI systems will be sterile and uninteresting. If we instead treat “paperclip” as an analog for some crazy weird shit that is alien and valence-less to humans, drawn from the same barrel of arbitrary and diverse desires that can be produced by selection processes, then the intuition pump loses all force. I’m back to feeling like our situations could have been reversed, and we shouldn’t be total assholes to the AI.
I don’t think that requires anything at all about AI systems converging to cosmopolitan values in the sense you are discussing here. I do think it is much more compelling if you accept some kind of analogy between the sorts of processes shaping human values and the processes shaping AI values, but this post (and the references you cite and other discussions you’ve had) don’t actually engage with the substance of that analogy and the kinds of issues raised in my comment are much closer to getting at the meat of the issue.
I also think the “not for free” part doesn’t contradict the views of Rich Sutton. I asked him this question and he agrees that all else equal it would be better if we handed off to human uploads instead of powerful AI. I think his view is that the proposed course of action from the alignment community is morally horrifying (since in practice he thinks the alternative is “attempt to have a slave society,” not “slow down AI progress for decades”—I think he might also believe that stagnation is much worse than a handoff but haven’t heard his view on this specifically) and that even if you are losing something in expectation by handing the universe off to AI systems it’s not as bad as the alternative.
What links here?
- Does AI risk “other” the AIs? by Joe Carlsmith (9 Jan 2024 17:51 UTC; 59 points)
- Does AI risk “other” the AIs? by Joe_Carlsmith (EA Forum; 9 Jan 2024 17:51 UTC; 22 points)
- So8res 2 Jun 2023 18:29 UTC
  2 points
  0
  Parent
  Thanks! Seems like a fine summary to me, and likely better than I would have done, and it includes a piece or two that I didn’t have (such as an argument from symmetry if the situations were reversed). I do think I knew a bunch of it, though. And e.g., my second parable was intended to be a pretty direct response to something like
  
  If we instead treat “paperclip” as an analog for some crazy weird shit that is alien and valence-less to humans, drawn from the same barrel of arbitrary and diverse desires that can be produced by selection processes, then the intuition pump loses all force.
  
  where it’s essentially trying to argue that this intuition pump still has force in precisely this case.
  - paulfchristiano 2 Jun 2023 18:41 UTC
    4 points
    2
    Parent
    To the extent the second parable has this kind of intuitive force I think it comes from: (i) the fact that the resulting values still sound really silly and simple (which I think is mostly deliberate hyperbole), (ii) the fact that the AI kills everyone along the way.