If X is “number of paperclips” and Y is something arbitrary that nobody optimizes, such as the ratio of number of bicycles on the moon to flying horses, optimizing X should be equally likely to increase or decrease Y in expectation. Otherwise “1-Y” would go in the opposite direction which can’t be true by symmetry. But if Y is something like “number of happy people”, Y will probably decrease because the world is already set up to keep Y up and a misaligned agent could disturb that state.
That makes sense, thanks. I then agree that it isn’t always true that Y actively decreases, but it should generally become harder for us to optimize. This is the difference between a utility decrease and an attainable utility decrease.
I don’t immediately see why this wouldn’t be true as well as the “intermediate version”. Can you expand?
If X is “number of paperclips” and Y is something arbitrary that nobody optimizes, such as the ratio of number of bicycles on the moon to flying horses, optimizing X should be equally likely to increase or decrease Y in expectation. Otherwise “1-Y” would go in the opposite direction which can’t be true by symmetry. But if Y is something like “number of happy people”, Y will probably decrease because the world is already set up to keep Y up and a misaligned agent could disturb that state.
That makes sense, thanks. I then agree that it isn’t always true that Y actively decreases, but it should generally become harder for us to optimize. This is the difference between a utility decrease and an attainable utility decrease.