In other words: if I seem to eat the same foods quite often (despite claiming to like variety), you might conclude that I like familiarity when it’s actually just that I like what I like. I’ve found a set of foods which I particularly enjoy (which I can rotate between for the sake of variety). That doesn’t mean it is familiarity itself which I enjoy.
Agents trade off exploring and exploiting, and when they’re exploiting they look like they’re minimizing prediction error?
Agents trade off exploring and exploiting, and when they’re exploiting they look like they’re minimizing prediction error?
That’s one hypothesis in the space I was pointing at, but not particularly the thing I expect to be true. Or, maybe I think it is somewhat true as an observation about policies, but doesn’t answer the question of how exactly variety and anti-variety are involved in our basic values.
A model which I more endorse:
We like to make progress understanding things. We don’t like chaotic stuff with no traction for learning (like TV fuzz). We like orderly stuff more, but only while learning about it; it then fades to zero, meaning we have to seek more variety for our hedonic treadmill. We really like patterns which keep establishing and then breaking expectations, especially if there is always a deeper pattern which makes sense of the exceptions (like music); these patterns maximize the feeling of learning progress.
But I think that’s just one aspect of our values, not a universal theory of human values.
I think this is sort of sideways. It’s true, but I think it also misses the deeper aspects of the theory I have in mind.
Yes, from easily observed behavior that’s what it looks like: exploitation is about minimizing prediction error and exploration is about, if not maximizing it, then at least not minimizing it. But the theory says that if we see exploration and the theory is correct, then exploration must somehow to built of out things that are ultimately trying to minimize prediction error.
I hope to give a more precise, mathematical explanation of this theory in the future, but for now I’ll give the best English language explanation I can of how exploration might work (keeping in mind we should be able to eventually find out exactly how it works if this theory is right with sufficient brain scanning technology).
I suspect exploration happens because a control system in the brain takes as input how much error minimization it observes as measured by how many good and bad signals get sent in other control systems. It then has a set point for some relatively stable and hard to update amount of bad signals it expects to see, and if it has not been seeing enough surprise/mistakes then it starts sending its own bad signals encouraging “restlessness” or “exploration”. This is similar to my explanation of creativity from another comment.
Agents trade off exploring and exploiting, and when they’re exploiting they look like they’re minimizing prediction error?
That’s one hypothesis in the space I was pointing at, but not particularly the thing I expect to be true. Or, maybe I think it is somewhat true as an observation about policies, but doesn’t answer the question of how exactly variety and anti-variety are involved in our basic values.
A model which I more endorse:
We like to make progress understanding things. We don’t like chaotic stuff with no traction for learning (like TV fuzz). We like orderly stuff more, but only while learning about it; it then fades to zero, meaning we have to seek more variety for our hedonic treadmill. We really like patterns which keep establishing and then breaking expectations, especially if there is always a deeper pattern which makes sense of the exceptions (like music); these patterns maximize the feeling of learning progress.
But I think that’s just one aspect of our values, not a universal theory of human values.
I think this is sort of sideways. It’s true, but I think it also misses the deeper aspects of the theory I have in mind.
Yes, from easily observed behavior that’s what it looks like: exploitation is about minimizing prediction error and exploration is about, if not maximizing it, then at least not minimizing it. But the theory says that if we see exploration and the theory is correct, then exploration must somehow to built of out things that are ultimately trying to minimize prediction error.
I hope to give a more precise, mathematical explanation of this theory in the future, but for now I’ll give the best English language explanation I can of how exploration might work (keeping in mind we should be able to eventually find out exactly how it works if this theory is right with sufficient brain scanning technology).
I suspect exploration happens because a control system in the brain takes as input how much error minimization it observes as measured by how many good and bad signals get sent in other control systems. It then has a set point for some relatively stable and hard to update amount of bad signals it expects to see, and if it has not been seeing enough surprise/mistakes then it starts sending its own bad signals encouraging “restlessness” or “exploration”. This is similar to my explanation of creativity from another comment.