gwern comments on How is reinforcement learning possible in non-sentient agents?

gwern 31 Aug 2021 17:17 UTC
3 points
One toy model worth considering is MENACE. It is clearly a model-free RL algorithm (a kind of tabular Q-learner) which successfully solves Tic-Tac-Toe, without even requiring a computer, but breaks most of one’s anthropomorphization or mentalization attempts.