The temporal difference learning algorithm is an efficient way to do reinforcement learning. And probably something like it happens in the human brain. If you are playing a game like chess, it may take a long time to get enough examples of wins and losses, for training an algorithm to predict good moves. Say you play 128 games, that’s only 7 bits of information, which is nothing. You have no way of knowing which moves in a game were good and which were bad. You have to assume all moves made during a losing game were bad. Which throws out a lot of information.
Temporal difference learning can learn “capturing pieces is good” and start optimizing for that instead. This implies that “inner alignment failure” is a constant fact of life. There are probably players that get quite far in chess doing nothing more than optimizing for piece capture.
I used to have anxiety about the many worlds hypothesis. It just seems kind of terrifying, constantly splitting into hell-worlds and the implications of quantum immortality. But it didn’t take long for it to stop bothering me and to even suppress thoughts about it. After all such thoughts don’t lead to a reward and cause problems and an RL brain should punish them.
But that’s kind of terrifying itself isn’t it? I underwent a drastic change to my utility function. And even the emergence of anti-rational heuristics for suppressing thoughts. Which a rational bayesian should never do (at least not for these reasons.)
Anyway gwern has a whole essay on multi-level optimization algorithms like this, that I haven’t seen linked yet: https://www.gwern.net/Backstop
The temporal difference learning algorithm is an efficient way to do reinforcement learning. And probably something like it happens in the human brain. If you are playing a game like chess, it may take a long time to get enough examples of wins and losses, for training an algorithm to predict good moves. Say you play 128 games, that’s only 7 bits of information, which is nothing. You have no way of knowing which moves in a game were good and which were bad. You have to assume all moves made during a losing game were bad. Which throws out a lot of information.
Temporal difference learning can learn “capturing pieces is good” and start optimizing for that instead. This implies that “inner alignment failure” is a constant fact of life. There are probably players that get quite far in chess doing nothing more than optimizing for piece capture.
I used to have anxiety about the many worlds hypothesis. It just seems kind of terrifying, constantly splitting into hell-worlds and the implications of quantum immortality. But it didn’t take long for it to stop bothering me and to even suppress thoughts about it. After all such thoughts don’t lead to a reward and cause problems and an RL brain should punish them.
But that’s kind of terrifying itself isn’t it? I underwent a drastic change to my utility function. And even the emergence of anti-rational heuristics for suppressing thoughts. Which a rational bayesian should never do (at least not for these reasons.)
Anyway gwern has a whole essay on multi-level optimization algorithms like this, that I haven’t seen linked yet: https://www.gwern.net/Backstop