I broadly agree with this perspective (and in fact it’s one of my reasons for optimism about AI alignment).
But usually when LessWrongers argue against “good enough” alignment, they’re arguing against alignment methods, saying that “nothing except proofs” will work, because only proofs give near-100% confidence. (I might be strawmanning this argument, I don’t really understand it.)
You’re talking about the internal structure of the AI system (is the AI system actually in fact optimizing for “human values”, or something else), where I do expect a sharper, qualitative distinction. I’m claiming that our ability to get on the right side of that distinction is relatively smooth across the methods that we could use.
Part of my optimism about AI alignment (relative to LW) comes from thinking that since there (probably) is a relatively sharp qualitative divide between “aligned computation” and “unaligned computation”, the “engineering approach” has more of a shot at working. (This isn’t a big factor in my optimism though.)
I broadly agree with this perspective (and in fact it’s one of my reasons for optimism about AI alignment).
But usually when LessWrongers argue against “good enough” alignment, they’re arguing against alignment methods, saying that “nothing except proofs” will work, because only proofs give near-100% confidence. (I might be strawmanning this argument, I don’t really understand it.)
You’re talking about the internal structure of the AI system (is the AI system actually in fact optimizing for “human values”, or something else), where I do expect a sharper, qualitative distinction. I’m claiming that our ability to get on the right side of that distinction is relatively smooth across the methods that we could use.
Part of my optimism about AI alignment (relative to LW) comes from thinking that since there (probably) is a relatively sharp qualitative divide between “aligned computation” and “unaligned computation”, the “engineering approach” has more of a shot at working. (This isn’t a big factor in my optimism though.)