I thought complexity of value was a separate thesis from the idea that value is fragile. For example they’re listed as separate theses in this post. It’s possible that complexity of value was always merely a sub-thesis of fragility of value, but I don’t think that’s a natural interpretation of the facts. I think the simplest explanation, consistent with my experience reading MIRI blog posts from before 2018, is that MIRI people just genuinely thought it would be hard to learn and reflect back the human utility function, at the level that GPT-4 can right now. (And again, I’m not claiming they thought that was the whole problem. My thesis is quite narrow and subtle here.)
I thought complexity of value was a separate thesis from the idea that value is fragile. For example they’re listed as separate theses in this post. It’s possible that complexity of value was always merely a sub-thesis of fragility of value, but I don’t think that’s a natural interpretation of the facts. I think the simplest explanation, consistent with my experience reading MIRI blog posts from before 2018, is that MIRI people just genuinely thought it would be hard to learn and reflect back the human utility function, at the level that GPT-4 can right now. (And again, I’m not claiming they thought that was the whole problem. My thesis is quite narrow and subtle here.)