I am not so convinced that penalizing more stuff will make these arguments weak enough that we don’t have to worry about them. For an example of why I think this, see Are minimal circuits deceptive?. Also, adding execution/memory constraints penalizes all hypothesis and I don’t think universes with consequentialists are asymmetrically penalized.
I agree about this being a special case of mesa-optimization.
adding execution/memory constraints penalizes all hypothesis
In reality these constraints do exist, so the question of “what happens if you don’t care about efficiency at all?” is really not important. In practice, efficiency is absolutely critical and everything that happens in AI is dominated by efficiency considerations.
I think that mesa-optimization will be a problem. It probably won’t look like aliens living in the Game of Life though.
It’ll look like an internal optimizer that just “decides” that the minds of the humans who created it are another part of the environment to be optimized for its not-correctly-aligned goal.
I am not so convinced that penalizing more stuff will make these arguments weak enough that we don’t have to worry about them. For an example of why I think this, see Are minimal circuits deceptive?. Also, adding execution/memory constraints penalizes all hypothesis and I don’t think universes with consequentialists are asymmetrically penalized.
I agree about this being a special case of mesa-optimization.
In reality these constraints do exist, so the question of “what happens if you don’t care about efficiency at all?” is really not important. In practice, efficiency is absolutely critical and everything that happens in AI is dominated by efficiency considerations.
I think that mesa-optimization will be a problem. It probably won’t look like aliens living in the Game of Life though.
It’ll look like an internal optimizer that just “decides” that the minds of the humans who created it are another part of the environment to be optimized for its not-correctly-aligned goal.