act-based = based on short-term preferences-on-reflection
For others who were confused about what “short-term preferences-on-reflection” would mean, I found this comment and its reply to be helpful.
Putting it into my own words: short-term preferences-on-reflection are about what you would want to happen in the near term, if you had a long time to think about it.
By way of illustration, AlphaZero’s long-term preference is to win the chess game, its short-term preference is whatever its policy network spits out as the best move to make next, and its short-term preference-on-reflection is the move it wants to make next after doing a fuck-ton of MCTS.
By way of illustration, AlphaZero’s long-term preference is to win the chess game, its short-term preference is whatever its policy network spits out as the best move to make next, and its short-term preference-on-reflection is the move it wants to make next after doing a fuck-ton of MCTS.
Short-term preferences are the value function one or a few moves out. If the algorithm is “reasonable,” then its short-term preference-on-reflection are the true function P(I win the game|I make this move). You could also talk about intermediate degrees of reflection.
For others who were confused about what “short-term preferences-on-reflection” would mean, I found this comment and its reply to be helpful.
Putting it into my own words: short-term preferences-on-reflection are about what you would want to happen in the near term, if you had a long time to think about it.
By way of illustration, AlphaZero’s long-term preference is to win the chess game, its short-term preference is whatever its policy network spits out as the best move to make next, and its short-term preference-on-reflection is the move it wants to make next after doing a fuck-ton of MCTS.
Short-term preferences are the value function one or a few moves out. If the algorithm is “reasonable,” then its short-term preference-on-reflection are the true function P(I win the game|I make this move). You could also talk about intermediate degrees of reflection.