Stuart_Armstrong comments on Reward function learning: the learning process

Stuart_Armstrong 30 Apr 2018 11:37 UTC
2 points
>If you already know how the human is going to answer, then it’s not high value of information to ask.
That’s the entire problem, if “ask a human” is programmed as a an endorsement of this being the right path to take, rather than as a genuine need for information.
>If the scorekeeper operates by conservation of expected evidence, it can never believe any sequence of questions will modify the score of any particular scenario on average.
That’s precisely my definition for “unriggable” learning processes, in the next post:https://www.lesswrong.com/posts/upLot6eG8cbXdKiFS/reward-function-learning-the-learning-process
- John_Maxwell 1 May 2018 6:21 UTC
  4 points
  Parent
  
  That’s precisely my definition for “unriggable” learning processes, in the next post:https://www.lesswrong.com/posts/upLot6eG8cbXdKiFS/reward-function-learning-the-learning-process
  
  That’s a link to this post, right? ;)
  - Stuart_Armstrong 1 May 2018 6:32 UTC
    2 points
    Parent
    Ooops, yes! Sorry, for some reason, I thought this was the post on the value function.