The world state is “surprisingly” ordered and low-entropy. Anywhere you see such order, you can bet that a human was responsible for it, and that the human cared about it.
Indeed, the state of the world optimized by humans for humans tends to be rather ordered, with low entropy. An unstable equilibrium. Which means that small random deviations from a given human-optimized environment are nearly universally exothermic and entropy-increasing. A non-intrusive AI trying to follow its reward function, e.g. serve people at the dining table efficiently, would consider multiple ways to achieve its goal and evaluate
The change in entropy of the environment after the task is accomplished.
The extra effort/energy/cost required to restore the state of the world to the previous one.
In your examples, breaking the vase is very costly, first because it increases the entropy and releases energy, and second because restoring the state of the world means reassembling the vase from the shards, a very costly undertaking in general. So a non-intrusive robot might prefer to go around the vase, or maybe pick it up, move it out of the way, then put it back where it was, rather than break it. But if the vase is a cheap plastic one, and it knows that it is replaceable by an identical item from the store room, the robot might not care as much and allow for the possibility of knocking it over.
Whether it is necessary to simulate the past to figure out the cost of deviating from the present state, I am not sure. Entropy is often in the eye of the beholder (a good example is the clutter on someone’s desk: it might seem like a mess to you, but they know exactly where everything is, and any externally imposed change, like arranging everything neatly, actually decreases the order for the desk’s inhabitant), so maybe an AI would have trouble figuring out the cost of restoration in many cases. Maybe swapping chairs is OK, maybe not. But at least it is not likely to go around breaking things. Unless it has unlimited supply of replacements, in which case it might be acceptable. Unless a particular broken item has sentimental value to a particular human. Which would require digging quite far into the past.
Whether it is necessary to simulate the past to figure out the cost of deviating from the present state, I am not sure.
You seem to be proposing low-impact AI / impact regularization methods. As I mentioned in the post:
we are gaining significantly on the “do what we want” desideratum: the point of inferring preferences is that we do not also penalize positive impacts that we want to happen.
Almost everything we want to do is irreversible / impactful / entropy-increasing, and many things that we don’t care about are also irreversible / impactful / entropy-increasing. If you penalize irreversibility / impact / entropy, then you will prevent your AI system from executing strategies that would be perfectly fine and even desirable. My intuition is that typically this would prevent your AI system from doing anything interesting (e.g. replacing CEOs).
Simulating the past is one way that you can infer preferences from the state of the world; it’s probably not the best way and I’m not tied to that particularly strategy. The important bit is that the state contains preference information and it is possible in theory to extract it.
That is indeed an interesting question, what constitutes uniqueness? Maybe simulating the past gives a hint in which circumstances the difference between snowflakes matter. Snow shoveling might be different from snowflake photography.
Indeed, the state of the world optimized by humans for humans tends to be rather ordered, with low entropy. An unstable equilibrium. Which means that small random deviations from a given human-optimized environment are nearly universally exothermic and entropy-increasing. A non-intrusive AI trying to follow its reward function, e.g. serve people at the dining table efficiently, would consider multiple ways to achieve its goal and evaluate
The change in entropy of the environment after the task is accomplished.
The extra effort/energy/cost required to restore the state of the world to the previous one.
In your examples, breaking the vase is very costly, first because it increases the entropy and releases energy, and second because restoring the state of the world means reassembling the vase from the shards, a very costly undertaking in general. So a non-intrusive robot might prefer to go around the vase, or maybe pick it up, move it out of the way, then put it back where it was, rather than break it. But if the vase is a cheap plastic one, and it knows that it is replaceable by an identical item from the store room, the robot might not care as much and allow for the possibility of knocking it over.
Whether it is necessary to simulate the past to figure out the cost of deviating from the present state, I am not sure. Entropy is often in the eye of the beholder (a good example is the clutter on someone’s desk: it might seem like a mess to you, but they know exactly where everything is, and any externally imposed change, like arranging everything neatly, actually decreases the order for the desk’s inhabitant), so maybe an AI would have trouble figuring out the cost of restoration in many cases. Maybe swapping chairs is OK, maybe not. But at least it is not likely to go around breaking things. Unless it has unlimited supply of replacements, in which case it might be acceptable. Unless a particular broken item has sentimental value to a particular human. Which would require digging quite far into the past.
You seem to be proposing low-impact AI / impact regularization methods. As I mentioned in the post:
Almost everything we want to do is irreversible / impactful / entropy-increasing, and many things that we don’t care about are also irreversible / impactful / entropy-increasing. If you penalize irreversibility / impact / entropy, then you will prevent your AI system from executing strategies that would be perfectly fine and even desirable. My intuition is that typically this would prevent your AI system from doing anything interesting (e.g. replacing CEOs).
Simulating the past is one way that you can infer preferences from the state of the world; it’s probably not the best way and I’m not tied to that particularly strategy. The important bit is that the state contains preference information and it is possible in theory to extract it.
But “ordered and low-entropy” objects could be also natural? E.g. spherical planets, ice crystals, anthills?
Yes. Best not break those, either, unless explicitly instructed.
How to clean streets after snow? Each snowflake is unique and will be destroyed.
That is indeed an interesting question, what constitutes uniqueness? Maybe simulating the past gives a hint in which circumstances the difference between snowflakes matter. Snow shoveling might be different from snowflake photography.