Hmm. It seems like if you really expected to be able to gain log-odds in expectation in repeated bets, you’d immediately update towards a high probability, due to conservation of expected evidence. But maybe a more causal/materialist model wouldn’t do this because it’s a fairly abstract consideration that doesn’t have obvious material support.
I see why “improve log-odds” is a nice heuristic for iteratively optimizing a policy towards greater chance of success, similar to the WalkSAT algorithm, which solves a constraint problem by changing variables around to reduce the number of violated constraints (even though the actual desideratum is to have no violated constraints); this is a way of “relaxing” the problem in a way that makes iterative hill-climbing-like approaches work more effectively.
Relatedly, some RL approaches give rewards for hitting non-victory targets in a game (e.g. number of levels cleared or key items gained), even if the eventual goal is to achieve a policy that beats the entire game.
I think possibly the key conceptual distinction you want to make is between short-term play and long-term play. If I deliberately assume an emotional stance, often a lot of the benefit to be gained therefrom is how it translates long-term correct play into myopic play for the emotional reward, assuming of course that the translation is correct. Long-term, you play for absolute probabilities. Short-term, you chase after “dignity”, aka stackable log-odds improvements, at least until you’re out of the curve’s basement.
I feel like this comment in particular is very clarifying with regards to the motivation of this stance. The benefit is that this imports recommendations of the ideal long-run policy into the short-run frame from which you’re actually acting.
I think that should maybe be in the post somewhere.
Hmm. It seems like if you really expected to be able to gain log-odds in expectation in repeated bets, you’d immediately update towards a high probability, due to conservation of expected evidence. But maybe a more causal/materialist model wouldn’t do this because it’s a fairly abstract consideration that doesn’t have obvious material support.
I see why “improve log-odds” is a nice heuristic for iteratively optimizing a policy towards greater chance of success, similar to the WalkSAT algorithm, which solves a constraint problem by changing variables around to reduce the number of violated constraints (even though the actual desideratum is to have no violated constraints); this is a way of “relaxing” the problem in a way that makes iterative hill-climbing-like approaches work more effectively.
Relatedly, some RL approaches give rewards for hitting non-victory targets in a game (e.g. number of levels cleared or key items gained), even if the eventual goal is to achieve a policy that beats the entire game.
I think possibly the key conceptual distinction you want to make is between short-term play and long-term play. If I deliberately assume an emotional stance, often a lot of the benefit to be gained therefrom is how it translates long-term correct play into myopic play for the emotional reward, assuming of course that the translation is correct. Long-term, you play for absolute probabilities. Short-term, you chase after “dignity”, aka stackable log-odds improvements, at least until you’re out of the curve’s basement.
I feel like this comment in particular is very clarifying with regards to the motivation of this stance. The benefit is that this imports recommendations of the ideal long-run policy into the short-run frame from which you’re actually acting.
I think that should maybe be in the post somewhere.