Depends on how much measure human-compatible values hold in the system’s initial distribution over values. A paperclip maximizer might do “moral philosophy” over what, exactly, represents the optimal form of paperclip, but that will not somehow lead to it valuing humans. Its distribution over values centers near-entirely on paperclips.
Then again, I suspect that human-compatible values don’t need much measure in the system’s distribution for the outcome you’re talking about to occur. If the system distributes resources in rough proportion to the measure each value holds, then even very low-measure values get a lot of resources dedicated to them. The universe is quite large, and sustaining some humans is relatively cheap.
Depends on how much measure human-compatible values hold in the system’s initial distribution over values. A paperclip maximizer might do “moral philosophy” over what, exactly, represents the optimal form of paperclip, but that will not somehow lead to it valuing humans. Its distribution over values centers near-entirely on paperclips.
Then again, I suspect that human-compatible values don’t need much measure in the system’s distribution for the outcome you’re talking about to occur. If the system distributes resources in rough proportion to the measure each value holds, then even very low-measure values get a lot of resources dedicated to them. The universe is quite large, and sustaining some humans is relatively cheap.