Reading this, I feel somewhat obligated to provide a different take. I am very much a moral realist, and my story for why the quoted passage isn’t a good argument is very different from yours. I guess I mostly want to object to the idea that [believing AI is dangerous] is predicated on moral relativism.
Here is my take. I dispute the premise:
In the proposed picture of singularity claim & orthogonality thesis, some thoughts are supposed to be accessible to the system, but others are not. For example:
I’ll grant that most of the items on the inaccessible list are, in fact, probably accessible to an ASI, but this doesn’t violate the orthogonality thesis. The Orthogonality thesis states that a system can have any combination of intelligence and goals, not that it can have any combination of intelligence and beliefs about ethics.
Thus, let’s grant that an AI with a paperclip-like utility function can figure out #6-#10. So what? How is [knowing that creating paperclips is morally wrong] going to make it behave differently?
You (meaning the author of the paper) may now object that we could program an AI to do what is morally right. I agree that this is possible. However:
(1) I am virtually certain that any configuration of maximal utility doesn’t include humans, so this does nothing to alleviate x-risks. Also, even if you subscribe to this goal, the political problem (i.e., convincing AI people to implement it) sounds impossible.
(2) We don’t know how to formalize ‘do what is morally right’.
(3) If you do black box search for a model that optimizes for what is morally right, this still leaves you with the entire inner alignment problem, which is arguably the hardest part of the alignment problem anyway.
Unlike you (now meaning Steve), I wouldn’t even claim that letting an AI figure out moral truths is a bad approach, but it certainly doesn’t solve the problem outright.
Oh OK, I’m sufficiently ignorant about philosophy that I may have unthinkingly mixed up various technically different claims like
“there is a fact of the matter about what is moral vs immoral”,
“reasonable intelligent agents, when reflecting about what to do, will tend to decide to do moral things”,
“whether things are moral vs immoral has nothing to do with random details about how human brains are constructed”,
“even non-social aliens with radically different instincts and drives and brains would find similar principles of morality, just as they would probably find similar laws of physics and math”.
I really only meant to disagree with that whole package lumped together, and maybe I described it wrong. If you advocate for the first of these without the others, I don’t have particularly strong feelings (…well, maybe the feeling of being confused and vaguely skeptical, but we don’t have to get into that).
Can one be a moral realist and subscribe to the orthogonality thesis? In which version of it? (In other words, does one have to reject moral realism in order to accept the standard argument for XRisk from AI? We should better be told! See section 4.1)
Reading this, I feel somewhat obligated to provide a different take. I am very much a moral realist, and my story for why the quoted passage isn’t a good argument is very different from yours. I guess I mostly want to object to the idea that [believing AI is dangerous] is predicated on moral relativism.
Here is my take. I dispute the premise:
I’ll grant that most of the items on the inaccessible list are, in fact, probably accessible to an ASI, but this doesn’t violate the orthogonality thesis. The Orthogonality thesis states that a system can have any combination of intelligence and goals, not that it can have any combination of intelligence and beliefs about ethics.
Thus, let’s grant that an AI with a paperclip-like utility function can figure out #6-#10. So what? How is [knowing that creating paperclips is morally wrong] going to make it behave differently?
You (meaning the author of the paper) may now object that we could program an AI to do what is morally right. I agree that this is possible. However:
(1) I am virtually certain that any configuration of maximal utility doesn’t include humans, so this does nothing to alleviate x-risks. Also, even if you subscribe to this goal, the political problem (i.e., convincing AI people to implement it) sounds impossible.
(2) We don’t know how to formalize ‘do what is morally right’.
(3) If you do black box search for a model that optimizes for what is morally right, this still leaves you with the entire inner alignment problem, which is arguably the hardest part of the alignment problem anyway.
Unlike you (now meaning Steve), I wouldn’t even claim that letting an AI figure out moral truths is a bad approach, but it certainly doesn’t solve the problem outright.
Oh OK, I’m sufficiently ignorant about philosophy that I may have unthinkingly mixed up various technically different claims like
“there is a fact of the matter about what is moral vs immoral”,
“reasonable intelligent agents, when reflecting about what to do, will tend to decide to do moral things”,
“whether things are moral vs immoral has nothing to do with random details about how human brains are constructed”,
“even non-social aliens with radically different instincts and drives and brains would find similar principles of morality, just as they would probably find similar laws of physics and math”.
I really only meant to disagree with that whole package lumped together, and maybe I described it wrong. If you advocate for the first of these without the others, I don’t have particularly strong feelings (…well, maybe the feeling of being confused and vaguely skeptical, but we don’t have to get into that).
Can one be a moral realist and subscribe to the orthogonality thesis? In which version of it? (In other words, does one have to reject moral realism in order to accept the standard argument for XRisk from AI? We should better be told! See section 4.1)