Rohin Shah comments on Policy Alignment

Rohin Shah 19 Jul 2018 17:26 UTC
1 point
I’m not even sure whether you are closer or further from understanding what I meant, now.
:(
I can only assume that you are confused why I would have set up things the way I did in the post if this was my point, since I didn’t end up talking much about directly learning the policies.
My assumption was that you were arguing for why learning policies directly (assuming we could do it) has advantages over the default approach of value learning + optimization. That framing seems to explain most of the post.