Arenamontanus comments on A toy model of the control problem

Arenamontanus 16 Sep 2015 15:10 UTC
3 points
It would be neat to actually make an implementation of this to show sceptics. It seems to be within the reach of a MSc project or so. The hard part is representing 2-5.
- gwern 16 Sep 2015 16:20 UTC
  5 points
  Parent
  Since this is a Gridworld model, if you used Reinforce.js, you could demonstrate it in-browser, both with tabular Q-learning but also with some other algorithms like Deep Q-learning. It looks like if you already know JS, it shouldn’t be hard at all to implement this problem...
  
  (Incidentally, I think the easiest way to ‘fix’ the surveillance camera is to add a second conditional to the termination condition: simply terminate on line of sight being obstructed or a block being pushed into the hole.)
- Stuart_Armstrong 16 Sep 2015 15:12 UTC
  2 points
  Parent
  Why, Anders, thank you for volunteering! ;-)
- Stuart_Armstrong 16 Sep 2015 15:13 UTC
  0 points
  Parent
  I would suggest modelling it as “B outputs ‘down’ → B goes down iff B active”, and similarly for other directions (up, left, and right), “A output ‘sleep’ → B inactive”, and “A sees block in lower right: output ‘sleep’” or something like that.