Thinking about it more, my brain generates the following argument for the perspective I think you’re advocating:
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I think a lot of human values look like pointers to real-world phenomena, rather than encodings of real-world phenomena.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
I’m not actually sure if that’s the exact argument I had in mind while writing the part about kludges, but I do find it fairly compelling, especially the way you had written it. Thanks.
I apologize for not being a complete response here, but I think if I were to try to summarize a few lingering general disagreements, I would say,
“Human values” don’t seem to be primarily what I care about. I care about “my values” and I’m skeptical that “human values” will converge onto what I care about.
I have intuitions that ethics is a lot more arbitrary than you seem to think it is. Your argument is peppered with statements to the effect of what would our CEV endorse?. I do agree that some degree of self-reflection is good, but I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
You appear to have interpreted my intuitions about the arbitrariness of concepts as instead about the complexity and fragility of concepts, which you expressed in confusion. Note that I think this reflects a basic miscommunication on my part, not yours. I do have some intuitions about complexity, less about fragility; but my statements above were (supposed to be) more about arbitrariness (I think).
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.