I struggle to understand the difference between #2 and #3. The prosaic AI alignment problem only exists because we don’t know how to make an agent that tries to do what we want it to do. Would you say that #3 is a concrete scenario for how #2 could lead to a catastrophe?
I think #3 could occur because of #2 (which I now mostly call “inner misalignment”), but it could also occur because of outer misalignment.
Broadly speaking, though, I think you’re right that #2 and #3 are different types of things. Because of that and other issues, I no longer think that this post disentangles the arguments satisfactorily; I’ll make a note of this at the top of the document.
I struggle to understand the difference between #2 and #3. The prosaic AI alignment problem only exists because we don’t know how to make an agent that tries to do what we want it to do. Would you say that #3 is a concrete scenario for how #2 could lead to a catastrophe?
I think #3 could occur because of #2 (which I now mostly call “inner misalignment”), but it could also occur because of outer misalignment.
Broadly speaking, though, I think you’re right that #2 and #3 are different types of things. Because of that and other issues, I no longer think that this post disentangles the arguments satisfactorily; I’ll make a note of this at the top of the document.