I may be missing the point here, so please don’t be offended. Isn’t this confusing “does the AI have (roughly) human values?” and “was the AI deliberately, rigorously designed to do so?” Obviously, our perception of the moral worth of an agent doesn’t require them to have values identical to ours. We can value another’s pleasure, even if we would not derive pleasure from the things they’re experiencing. We can value another’s love, even if we do not feel as affectionate towards their loved ones. But do we value an agent who’s goal is to suffer as much as possible? Do we value an agent motivated purely by hatred?
Our values are our values; they determine our perception of moral worth. And while many people might be happy about a strange and wonderful AI civilization, even if it was very different from what we might choose to build, very few would want a boring one. That’s a values question, or a meta values question; there’s no way to posit a worthwhile AI civilization without assuming that on some level our values align.
The example given for a “good successor albeit unaligned” AI is a simulated civilization that eventually learns about the real world and figures out how to make AI work here. Certainly this isn’t an AI with deliberate, rigorous Friendliness programming, but if you’d prefer handing the universe off to it to taking a 10% extinction risk, isn’t that because you’re hoping it will be more or less Friendly anyway? And at that point, the answer to when is unaligned AI morally valuable is when it is, in fact, aligned, regardless of whether that alignment was due to a simulated civilization having somewhat similar values to our own, or any other reason?
Upvoted, this was exactly my reaction to this post. However, you may want to look at the link to alignment in the OP. Christiano is using “alignment” in a very narrow sense. For example, from the linked post:
The definition is intended de dicto ratherthan de re. An aligned A is trying to “do what H wants it to do.” Suppose A thinks that H likes apples, and so goes to the store to buy some apples, but H really prefers oranges. I’d call this behavior aligned because A is trying to do what H wants, even though the thing it is trying to do (“buy apples”) turns out not to be what H wants: the de re interpretation is false but the de dicto interpretation is true.
… which rings at least slightly uncomfortable to my ears.
Well then, isn’t the answer that we care about de re alignment, and whether or not an AI is de dicto aligned is relevant only as far as it predicts de re alignment? We might expect that the two would converge in the limit of superintelligence, and perhaps that aiming for de dicto alignment might be the easier immediate target, but the moral worth would be a factor of what the AI actually did.
That does clear up the seeming confusion behind the OP, though, so thanks!
I may be missing the point here, so please don’t be offended. Isn’t this confusing “does the AI have (roughly) human values?” and “was the AI deliberately, rigorously designed to do so?” Obviously, our perception of the moral worth of an agent doesn’t require them to have values identical to ours. We can value another’s pleasure, even if we would not derive pleasure from the things they’re experiencing. We can value another’s love, even if we do not feel as affectionate towards their loved ones. But do we value an agent who’s goal is to suffer as much as possible? Do we value an agent motivated purely by hatred?
Our values are our values; they determine our perception of moral worth. And while many people might be happy about a strange and wonderful AI civilization, even if it was very different from what we might choose to build, very few would want a boring one. That’s a values question, or a meta values question; there’s no way to posit a worthwhile AI civilization without assuming that on some level our values align.
The example given for a “good successor albeit unaligned” AI is a simulated civilization that eventually learns about the real world and figures out how to make AI work here. Certainly this isn’t an AI with deliberate, rigorous Friendliness programming, but if you’d prefer handing the universe off to it to taking a 10% extinction risk, isn’t that because you’re hoping it will be more or less Friendly anyway? And at that point, the answer to when is unaligned AI morally valuable is when it is, in fact, aligned, regardless of whether that alignment was due to a simulated civilization having somewhat similar values to our own, or any other reason?
Upvoted, this was exactly my reaction to this post. However, you may want to look at the link to alignment in the OP. Christiano is using “alignment” in a very narrow sense. For example, from the linked post:
… which rings at least slightly uncomfortable to my ears.
Well then, isn’t the answer that we care about de re alignment, and whether or not an AI is de dicto aligned is relevant only as far as it predicts de re alignment? We might expect that the two would converge in the limit of superintelligence, and perhaps that aiming for de dicto alignment might be the easier immediate target, but the moral worth would be a factor of what the AI actually did.
That does clear up the seeming confusion behind the OP, though, so thanks!