While most of this of this seems sensible, I don’t understand how your last sentence follows. I have heard similar strategies suggested to reduce the probability of paperclipping, but it seems like if we actually succeed in producing a true friendly AI, the quantity it tries to maximize (expected winning, P(winning), or something else) will depend on how we evaluate outcomes.
While most of this of this seems sensible, I don’t understand how your last sentence follows. I have heard similar strategies suggested to reduce the probability of paperclipping, but it seems like if we actually succeed in producing a true friendly AI, the quantity it tries to maximize (expected winning, P(winning), or something else) will depend on how we evaluate outcomes.