johnswentworth comments on Selection Theorems: A Program For Understanding Agents

johnswentworth Oct 11, 2021, 4:08 PM
LW: 4 AF: 4
AF
I think that’s a reasonable summary as written. Two minor quibbles, which you are welcome to ignore:
Selection theorems are helpful because (1) they can provide additional assumptions that can help with learning values by observing human behavior
I agree with the literal content of this sentence, but I personally don’t imagine limiting it to behavioral data. I expect embedding-relevant selection theorems, which would also open the door to using internal structure or low-level dynamics of the brain to learn values (and human models, precision of approximations, etc).
Unfortunately, many coherence arguments implicitly assume that the agent has no internal state, which is not true for humans, so this argument does not clearly work. As another example, our ML training procedures will likely also select for agents that don’t waste resources, which could allow us to conclude that the resulting agents can be represented as maximizing expected utility.
Agents selected by ML (e.g. RL training on games) also often have internal state.
- Rohin Shah Oct 11, 2021, 4:52 PM
  LW: 4 AF: 4
  AF Parent
  Edited to
  Selection theorems are helpful because (1) they can provide additional assumptions that can help with learning human values
  and
  [...] the resulting agents can be represented as maximizing expected utility, if the agents don’t have internal state.
  (For the second one, that’s one of the reasons why I had the weasel word “could”, but on reflection it’s worth calling out explicitly given I mention it in the previous sentence.)
  - johnswentworth Oct 11, 2021, 4:55 PM
    LW: 4 AF: 4
    AF Parent
    Cool, looks good.