Superhuman AI will be running into unknown situations all the time because of different capabilities.
It might have some sort of perverse incentive to get there, but unless it has already extrapolated the values enough to make them robust to those situations, it really shouldn’t. It’s not clear how specifically it shouldn’t, what is a principled way of making such decisions.
I’d argue that’s how humanity has to deal with value extrapolation ever since the start of the industrial revolution. It happens every time you gain new technological capabilities. So there isn’t a perverse incentive here, it’s just how technological advancement works.
Now from an EA/LW perspective this seems like unprincipled extrapolation of values, and it is, but what else do we have, really?
It might have some sort of perverse incentive to get there, but unless it has already extrapolated the values enough to make them robust to those situations, it really shouldn’t. It’s not clear how specifically it shouldn’t, what is a principled way of making such decisions.
I’d argue that’s how humanity has to deal with value extrapolation ever since the start of the industrial revolution. It happens every time you gain new technological capabilities. So there isn’t a perverse incentive here, it’s just how technological advancement works.
Now from an EA/LW perspective this seems like unprincipled extrapolation of values, and it is, but what else do we have, really?
Patience. And cosmic wealth, some of which can be used to ponder these questions for however many tredecillions of years it takes.