Hypothesis 1 is closer to the mark, though I’d highlight that it’s actually fairly unclear what you mean by “cosmopolitan values” or exactly what claim you are making (and that ambiguity is hiding most of the substance of disagreements).
I’m raising the issue of pico-pseudokindness here because I perceive it as (i) an important undercurrent in this post, (ii) an important part of the actual disagreements you are trying to address. (I tried to flag this at the start.)
More broadly, I don’t really think you are engaging productively with people who disagree with you. I suspect that if you showed this post to someone you perceive yourself to be arguing with, they would say that you seem not to understand the position—the words aren’t really engaging with their view, and the stories aren’t plausible on their models of the world but in ways that go beyond the literal claim in the post.
I think that would hold in particular for Robin Hanson or Rich Sutton. I don’t think they are accessing a pre-theoretic intuition that you are discarding by premature theorizing. I think the better summary is that you don’t understand their position very well or are choosing not to engage with the important parts of it. (Just as Robin doesn’t seem to understand your position ~at all.)
I don’t think the point about pico-pseudokindness is central for either Robert Hanson or Rich Sutton. I think it is more obviously relevant to a bunch of recent arguments Eliezer has gotten into on Twitter.
Thanks! I’m curious for your paraphrase of the opposing view that you think I’m failing to understand.
(I put >50% probability that I could paraphrase a version of “if the AIs decide to kill us, that’s fine” that Sutton would basically endorse (in the right social context), and that would basically route through a version of “broad cosmopolitan value is universally compelling”, but perhaps when you give a paraphrase it will sound like an obviously-better explanation of the opposing view and I’ll update.)
Humans and AI systems probably want different things. From the human perspective, it would be better if the universe was determined by what the humans wanted. But we shouldn’t be willing to pay huge costs, and shouldn’t attempt to create a slave society where AI systems do humans’ bidding forever, just to ensure that human values win out. After all, we really wouldn’t want that outcome if our situations had been reversed. And indeed we are the beneficiary of similar values-turnover in the past, as our ancestors have been open (perhaps by necessity rather than choice) to values changes that they would sometimes prefer hadn’t happened.
We can imagine really sterile outcomes, like replicators colonizing space with an identical pattern repeated endlessly, or AI systems that want to maximize the number of paperclips. And considering those outcomes can help undermine the cosmopolitan intuition that we should respect the AI we build. But in fact that intuition pump relies crucially on its wildly unrealistic premises, that the kind of thing brought about by AI systems will be sterile and uninteresting. If we instead treat “paperclip” as an analog for some crazy weird shit that is alien and valence-less to humans, drawn from the same barrel of arbitrary and diverse desires that can be produced by selection processes, then the intuition pump loses all force. I’m back to feeling like our situations could have been reversed, and we shouldn’t be total assholes to the AI.
I don’t think that requires anything at all about AI systems converging to cosmopolitan values in the sense you are discussing here. I do think it is much more compelling if you accept some kind of analogy between the sorts of processes shaping human values and the processes shaping AI values, but this post (and the references you cite and other discussions you’ve had) don’t actually engage with the substance of that analogy and the kinds of issues raised in my comment are much closer to getting at the meat of the issue.
I also think the “not for free” part doesn’t contradict the views of Rich Sutton. I asked him this question and he agrees that all else equal it would be better if we handed off to human uploads instead of powerful AI. I think his view is that the proposed course of action from the alignment community is morally horrifying (since in practice he thinks the alternative is “attempt to have a slave society,” not “slow down AI progress for decades”—I think he might also believe that stagnation is much worse than a handoff but haven’t heard his view on this specifically) and that even if you are losing something in expectation by handing the universe off to AI systems it’s not as bad as the alternative.
Thanks! Seems like a fine summary to me, and likely better than I would have done, and it includes a piece or two that I didn’t have (such as an argument from symmetry if the situations were reversed). I do think I knew a bunch of it, though. And e.g., my second parable was intended to be a pretty direct response to something like
If we instead treat “paperclip” as an analog for some crazy weird shit that is alien and valence-less to humans, drawn from the same barrel of arbitrary and diverse desires that can be produced by selection processes, then the intuition pump loses all force.
where it’s essentially trying to argue that this intuition pump still has force in precisely this case.
To the extent the second parable has this kind of intuitive force I think it comes from: (i) the fact that the resulting values still sound really silly and simple (which I think is mostly deliberate hyperbole), (ii) the fact that the AI kills everyone along the way.
Hypothesis 1 is closer to the mark, though I’d highlight that it’s actually fairly unclear what you mean by “cosmopolitan values” or exactly what claim you are making (and that ambiguity is hiding most of the substance of disagreements).
I’m raising the issue of pico-pseudokindness here because I perceive it as (i) an important undercurrent in this post, (ii) an important part of the actual disagreements you are trying to address. (I tried to flag this at the start.)
More broadly, I don’t really think you are engaging productively with people who disagree with you. I suspect that if you showed this post to someone you perceive yourself to be arguing with, they would say that you seem not to understand the position—the words aren’t really engaging with their view, and the stories aren’t plausible on their models of the world but in ways that go beyond the literal claim in the post.
I think that would hold in particular for Robin Hanson or Rich Sutton. I don’t think they are accessing a pre-theoretic intuition that you are discarding by premature theorizing. I think the better summary is that you don’t understand their position very well or are choosing not to engage with the important parts of it. (Just as Robin doesn’t seem to understand your position ~at all.)
I don’t think the point about pico-pseudokindness is central for either Robert Hanson or Rich Sutton. I think it is more obviously relevant to a bunch of recent arguments Eliezer has gotten into on Twitter.
Thanks! I’m curious for your paraphrase of the opposing view that you think I’m failing to understand.
(I put >50% probability that I could paraphrase a version of “if the AIs decide to kill us, that’s fine” that Sutton would basically endorse (in the right social context), and that would basically route through a version of “broad cosmopolitan value is universally compelling”, but perhaps when you give a paraphrase it will sound like an obviously-better explanation of the opposing view and I’ll update.)
I think a closer summary is:
I don’t think that requires anything at all about AI systems converging to cosmopolitan values in the sense you are discussing here. I do think it is much more compelling if you accept some kind of analogy between the sorts of processes shaping human values and the processes shaping AI values, but this post (and the references you cite and other discussions you’ve had) don’t actually engage with the substance of that analogy and the kinds of issues raised in my comment are much closer to getting at the meat of the issue.
I also think the “not for free” part doesn’t contradict the views of Rich Sutton. I asked him this question and he agrees that all else equal it would be better if we handed off to human uploads instead of powerful AI. I think his view is that the proposed course of action from the alignment community is morally horrifying (since in practice he thinks the alternative is “attempt to have a slave society,” not “slow down AI progress for decades”—I think he might also believe that stagnation is much worse than a handoff but haven’t heard his view on this specifically) and that even if you are losing something in expectation by handing the universe off to AI systems it’s not as bad as the alternative.
Thanks! Seems like a fine summary to me, and likely better than I would have done, and it includes a piece or two that I didn’t have (such as an argument from symmetry if the situations were reversed). I do think I knew a bunch of it, though. And e.g., my second parable was intended to be a pretty direct response to something like
where it’s essentially trying to argue that this intuition pump still has force in precisely this case.
To the extent the second parable has this kind of intuitive force I think it comes from: (i) the fact that the resulting values still sound really silly and simple (which I think is mostly deliberate hyperbole), (ii) the fact that the AI kills everyone along the way.