Whether it is necessary to simulate the past to figure out the cost of deviating from the present state, I am not sure.
You seem to be proposing low-impact AI / impact regularization methods. As I mentioned in the post:
we are gaining significantly on the “do what we want” desideratum: the point of inferring preferences is that we do not also penalize positive impacts that we want to happen.
Almost everything we want to do is irreversible / impactful / entropy-increasing, and many things that we don’t care about are also irreversible / impactful / entropy-increasing. If you penalize irreversibility / impact / entropy, then you will prevent your AI system from executing strategies that would be perfectly fine and even desirable. My intuition is that typically this would prevent your AI system from doing anything interesting (e.g. replacing CEOs).
Simulating the past is one way that you can infer preferences from the state of the world; it’s probably not the best way and I’m not tied to that particularly strategy. The important bit is that the state contains preference information and it is possible in theory to extract it.
You seem to be proposing low-impact AI / impact regularization methods. As I mentioned in the post:
Almost everything we want to do is irreversible / impactful / entropy-increasing, and many things that we don’t care about are also irreversible / impactful / entropy-increasing. If you penalize irreversibility / impact / entropy, then you will prevent your AI system from executing strategies that would be perfectly fine and even desirable. My intuition is that typically this would prevent your AI system from doing anything interesting (e.g. replacing CEOs).
Simulating the past is one way that you can infer preferences from the state of the world; it’s probably not the best way and I’m not tied to that particularly strategy. The important bit is that the state contains preference information and it is possible in theory to extract it.