My point is that specific behaviors is not the kind of thing that we can make decisions about in programming a FAI, so I don’t see how “iff we program it to” applies to a question of plausibility of a specific behavior.
There is more than one way to program an FAI—see for example CEV which is currently ambiguous. There are also different individuals or groups of individuals which an AI can be friendly to and still qualify as “Friendly Enough” to warrant the label. It is likely that the actual (and coherently extrapolatable) preferences of humans differ with respect to whether rewarding AI-encouragers is a good thing.
“Things that are mostly Friendly” is a huge class in which humanly constructible FAIs are a tiny dot (I expect we can either do a perfect job or none at all, while it’s theoretically but not humanly possible to create almost-perfect-but-not-quite FAIs). I’m talking about that dot, and I expect within that dot, the answer to this question is determined one way or the other, and we don’t know which.
I’m pleasantly surprised. It seems that we disagree with respect to actual predictions about the universe rather than the expected, and more common “just miscommunication/responding to a straw man”. Within that dot the answer is not determined!
Whether programmers want it to be done doesn’t plausibly influence whether it’s the correct thing to do, and FAI does the correct thing, or it’s not a FAI.
I’m familiar with the point—and make it myself rather frequently. It does not apply here—due to the aforementioned rejection of the “determined within the dot” premise.
There is more than one way to program an FAI—see for example CEV which is currently ambiguous. There are also different individuals or groups of individuals which an AI can be friendly to and still qualify as “Friendly Enough” to warrant the label. It is likely that the actual (and coherently extrapolatable) preferences of humans differ with respect to whether rewarding AI-encouragers is a good thing.
I’m pleasantly surprised. It seems that we disagree with respect to actual predictions about the universe rather than the expected, and more common “just miscommunication/responding to a straw man”. Within that dot the answer is not determined!
I’m familiar with the point—and make it myself rather frequently. It does not apply here—due to the aforementioned rejection of the “determined within the dot” premise.