jessicata comments on The Learning-Theoretic AI Alignment Research Agenda

jessicata 4 Jul 2018 21:43 UTC
0 points
AF
Ok, I am confused by what you mean by “trap”. I thought “trap” meant a set of states you can’t get out of. And if the second law of thermodynamics is true, you can’t get from a high-entropy state to a low-entropy state. What do you mean by “trap”?
- Vanessa Kosoy 5 Jul 2018 14:01 UTC
  0 points
  AF Parent
  To first approximation, a “trap” is a an action s.t. taking it loses long-term value in expectation, i.e an action which is outside the set $A_{M}^{0}$ that I defined here (see the end of Definition 1). This set is always non-empty, since it at least has to contain the optimal action. However, this definition is not very useful when, for example, your environment contains a state that you cannot escape and you also cannot avoid (for example, the heat death of the universe might be such a state), since, in this case, nothing is a trap. To be more precise we need to go from an analysis which is asymptotic in the time discount parameter to an analysis with a fixed, finite time discount parameter (similarly to how with time complexity, we usually start from analyzing the asymptotic complexity of an algorithm, but ultimately we are interested in particular inputs of finite size). For a fixed time time discount parameter, the concept of a trap becomes “fuzzy”: a trap is an action which loses a substantial fraction of the value.