Rohin Shah comments on Alignment Newsletter #48

Rohin Shah 12 Mar 2019 22:20 UTC
LW: 2 AF: 1
AF
What advantages do you think this has compared to vanilla RL on U + AUP_Penalty?
- TurnTrout 12 Mar 2019 23:40 UTC
  LW: 2 AF: 1
  AF Parent
  it’s also mild on the inside of the algorithm, not just in its effects on the world. this could avert problems with inner optimizers. beyond that, I haven’t thought enough about the behavior of the agent. I might reply with another comment.