Shmi comments on Learning preferences by looking at the world

Shmi 13 Feb 2019 8:42 UTC
8 points
The world state is “surprisingly” ordered and low-entropy. Anywhere you see such order, you can bet that a human was responsible for it, and that the human cared about it.
Indeed, the state of the world optimized by humans for humans tends to be rather ordered, with low entropy. An unstable equilibrium. Which means that small random deviations from a given human-optimized environment are nearly universally exothermic and entropy-increasing. A non-intrusive AI trying to follow its reward function, e.g. serve people at the dining table efficiently, would consider multiple ways to achieve its goal and evaluate
- The change in entropy of the environment after the task is accomplished.
- The extra effort/energy/cost required to restore the state of the world to the previous one.
In your examples, breaking the vase is very costly, first because it increases the entropy and releases energy, and second because restoring the state of the world means reassembling the vase from the shards, a very costly undertaking in general. So a non-intrusive robot might prefer to go around the vase, or maybe pick it up, move it out of the way, then put it back where it was, rather than break it. But if the vase is a cheap plastic one, and it knows that it is replaceable by an identical item from the store room, the robot might not care as much and allow for the possibility of knocking it over.
Whether it is necessary to simulate the past to figure out the cost of deviating from the present state, I am not sure. Entropy is often in the eye of the beholder (a good example is the clutter on someone’s desk: it might seem like a mess to you, but they know exactly where everything is, and any externally imposed change, like arranging everything neatly, actually decreases the order for the desk’s inhabitant), so maybe an AI would have trouble figuring out the cost of restoration in many cases. Maybe swapping chairs is OK, maybe not. But at least it is not likely to go around breaking things. Unless it has unlimited supply of replacements, in which case it might be acceptable. Unless a particular broken item has sentimental value to a particular human. Which would require digging quite far into the past.
What links here?
- Rohin Shah's comment on Disincentives for participating on LW/AF by Wei Dai (12 May 2019 4:49 UTC; 10 points)
- Rohin Shah 13 Feb 2019 17:04 UTC
  4 points
  Parent
  Whether it is necessary to simulate the past to figure out the cost of deviating from the present state, I am not sure.
  You seem to be proposing low-impact AI / impact regularization methods. As I mentioned in the post:
  we are gaining significantly on the “do what we want” desideratum: the point of inferring preferences is that we do not also penalize positive impacts that we want to happen.
  Almost everything we want to do is irreversible / impactful / entropy-increasing, and many things that we don’t care about are also irreversible / impactful / entropy-increasing. If you penalize irreversibility / impact / entropy, then you will prevent your AI system from executing strategies that would be perfectly fine and even desirable. My intuition is that typically this would prevent your AI system from doing anything interesting (e.g. replacing CEOs).
  Simulating the past is one way that you can infer preferences from the state of the world; it’s probably not the best way and I’m not tied to that particularly strategy. The important bit is that the state contains preference information and it is possible in theory to extract it.
- avturchin 13 Feb 2019 10:36 UTC
  3 points
  Parent
  But “ordered and low-entropy” objects could be also natural? E.g. spherical planets, ice crystals, anthills?
  - Shmi 13 Feb 2019 15:36 UTC
    2 points
    Parent
    Yes. Best not break those, either, unless explicitly instructed.
    - avturchin 13 Feb 2019 20:01 UTC
      4 points
      Parent
      How to clean streets after snow? Each snowflake is unique and will be destroyed.
      - Shmi 14 Feb 2019 6:42 UTC
        3 points
        Parent
        That is indeed an interesting question, what constitutes uniqueness? Maybe simulating the past gives a hint in which circumstances the difference between snowflakes matter. Snow shoveling might be different from snowflake photography.