One toy model worth considering is MENACE. It is clearly a model-free RL algorithm (a kind of tabular Q-learner) which successfully solves Tic-Tac-Toe, without even requiring a computer, but breaks most of one’s anthropomorphization or mentalization attempts.
One toy model worth considering is MENACE. It is clearly a model-free RL algorithm (a kind of tabular Q-learner) which successfully solves Tic-Tac-Toe, without even requiring a computer, but breaks most of one’s anthropomorphization or mentalization attempts.