Reflective stability is a huge component of why value identification is hard, and why it’s hard to get feedback on whether your AI actually understands human values before it reaches quite high levels of intelligence.
I don’t understand this argument. I don’t mean that I disagree, I just mean that I don’t understand it. Reflective stability seems hard no matter what values we’re talking about, right? What about human values being complex makes it any harder? And if the problem is independent of the complexity of value, then why did people talk about complexity of value to begin with?
(Separately, I don’t think current human efforts to “figure out” human values have been anywhere near adequate, though I think this is mostly a function of philosophy being what it is. People with better epistemology seem to make wildly more progress in figuring out human values compared to their contemporaries.)
I thought complexity of value was a separate thesis from the idea that value is fragile. For example they’re listed as separate theses in this post. It’s possible that complexity of value was always merely a sub-thesis of fragility of value, but I don’t think that’s a natural interpretation of the facts. I think the simplest explanation, consistent with my experience reading MIRI blog posts from before 2018, is that MIRI people just genuinely thought it would be hard to learn and reflect back the human utility function, at the level that GPT-4 can right now. (And again, I’m not claiming they thought that was the whole problem. My thesis is quite narrow and subtle here.)
I don’t understand this argument. I don’t mean that I disagree, I just mean that I don’t understand it. Reflective stability seems hard no matter what values we’re talking about, right? What about human values being complex makes it any harder? And if the problem is independent of the complexity of value, then why did people talk about complexity of value to begin with?
Complexity of value is part of why value is fragile.
(Separately, I don’t think current human efforts to “figure out” human values have been anywhere near adequate, though I think this is mostly a function of philosophy being what it is. People with better epistemology seem to make wildly more progress in figuring out human values compared to their contemporaries.)
I thought complexity of value was a separate thesis from the idea that value is fragile. For example they’re listed as separate theses in this post. It’s possible that complexity of value was always merely a sub-thesis of fragility of value, but I don’t think that’s a natural interpretation of the facts. I think the simplest explanation, consistent with my experience reading MIRI blog posts from before 2018, is that MIRI people just genuinely thought it would be hard to learn and reflect back the human utility function, at the level that GPT-4 can right now. (And again, I’m not claiming they thought that was the whole problem. My thesis is quite narrow and subtle here.)