It would be neat to actually make an implementation of this to show sceptics. It seems to be within the reach of a MSc project or so. The hard part is representing 2-5.
Since this is a Gridworld model, if you used Reinforce.js, you could demonstrate it in-browser, both with tabular Q-learning but also with some other algorithms like Deep Q-learning. It looks like if you already know JS, it shouldn’t be hard at all to implement this problem...
(Incidentally, I think the easiest way to ‘fix’ the surveillance camera is to add a second conditional to the termination condition: simply terminate on line of sight being obstructed or a block being pushed into the hole.)
I would suggest modelling it as “B outputs ‘down’ → B goes down iff B active”, and similarly for other directions (up, left, and right), “A output ‘sleep’ → B inactive”, and “A sees block in lower right: output ‘sleep’” or something like that.
It would be neat to actually make an implementation of this to show sceptics. It seems to be within the reach of a MSc project or so. The hard part is representing 2-5.
Since this is a Gridworld model, if you used Reinforce.js, you could demonstrate it in-browser, both with tabular Q-learning but also with some other algorithms like Deep Q-learning. It looks like if you already know JS, it shouldn’t be hard at all to implement this problem...
(Incidentally, I think the easiest way to ‘fix’ the surveillance camera is to add a second conditional to the termination condition: simply terminate on line of sight being obstructed or a block being pushed into the hole.)
Why, Anders, thank you for volunteering! ;-)
I would suggest modelling it as “B outputs ‘down’ → B goes down iff B active”, and similarly for other directions (up, left, and right), “A output ‘sleep’ → B inactive”, and “A sees block in lower right: output ‘sleep’” or something like that.