I don’t buy the lottery example. You never encoded the fact that you know tomorrow’s numbers. Shouldn’t the prior be that you win a million guranteed if you buy the ticket?
What I’m doing is modeling “gamble with the money” as a simple action—you can imaging there’s a big red button that gives you $200 1/16th of the time and takes all your money otherwise.
And then I’m modeling “but a lotto ticket” as a compound action consisting of entering each number individually.
“Knowing the numbers” means your world model understands that if you’ve entered the right numbers, you get the money. But it doesn’t make “enter the right numbers” probable in the prior.
Of course the conclusion is reverse if we make “enter the right numbers” into a primitive action.
I also didn’t understand that. I was thinking of it more like AlphaStar in the sense that your prior is that you’re going to continue using your current (probabilistic) policy for all the steps involved in what you’re thinking about.
(But not like AlphaStar in that the brain is more likely to do one-or-a-few-steps of rollout with clever hierarchical abstract representations of plans, rather than dozens-of-steps rollouts in a simple one-step-at-a-time way.)
I don’t buy the lottery example. You never encoded the fact that you know tomorrow’s numbers. Shouldn’t the prior be that you win a million guranteed if you buy the ticket?
No! You also have to enter the right numbers.
What I’m doing is modeling “gamble with the money” as a simple action—you can imaging there’s a big red button that gives you $200 1/16th of the time and takes all your money otherwise.
And then I’m modeling “but a lotto ticket” as a compound action consisting of entering each number individually.
“Knowing the numbers” means your world model understands that if you’ve entered the right numbers, you get the money. But it doesn’t make “enter the right numbers” probable in the prior.
Of course the conclusion is reverse if we make “enter the right numbers” into a primitive action.
I also didn’t understand that. I was thinking of it more like AlphaStar in the sense that your prior is that you’re going to continue using your current (probabilistic) policy for all the steps involved in what you’re thinking about.
(But not like AlphaStar in that the brain is more likely to do one-or-a-few-steps of rollout with clever hierarchical abstract representations of plans, rather than dozens-of-steps rollouts in a simple one-step-at-a-time way.)
See my answer to Gurkenglas.
My understanding of planning by inference (aka active inference?) is not so much like AlphaStar. More to say here, but I’m out of time atm.