abramdemski comments on Policy Alignment

abramdemski 18 Jul 2018 21:57 UTC
LW: 3 AF: 2
AF
I’m not even sure whether you are closer or further from understanding what I meant, now. I think you are probably closer, but stating it in a way I wouldn’t. I see that I need to do some careful disambiguation of background assumptions and language.
Instead of trying to value learn and then optimize, just go straight for the policy instead, which is safer than relying on accurately decomposing a human into two different things that are both difficult to learn and have weird interactions with each other.
This part, at least, is getting at the same intuition I’m coming from. However, I can only assume that you are confused why I would have set up things the way I did in the post if this was my point, since I didn’t end up talking much about directly learning the policies. (I am thinking I’ll write another post to make that connection clearer.)
I will have to think harder about the difference between how you’re framing things and how I would frame things, to try to clarify more.
- Rohin Shah 19 Jul 2018 17:26 UTC
  1 point
  Parent
  I’m not even sure whether you are closer or further from understanding what I meant, now.
  :(
  I can only assume that you are confused why I would have set up things the way I did in the post if this was my point, since I didn’t end up talking much about directly learning the policies.
  My assumption was that you were arguing for why learning policies directly (assuming we could do it) has advantages over the default approach of value learning + optimization. That framing seems to explain most of the post.