So is intrinsic curiocity a reinforcement learning or unsupervised learning approach?
With comparing claims of different models in the same expression language you do not need to have a dynamic model of inconsistency.
If you need the model to decide about itself where it is wrong, there is the possiblity that the model ,which can be dynamic, is poor quality about it.
What if you are wrong about what is wrong?
Suppose we are making an inconsistency detector by choosement. Consider all our previous situations. Then generate a value representing how wrong that choice was in that situation. Then stick with the one that has highest wrongness and start “here we need to improve” whatever that means.
So at a given situation how wrong was the actual choice made? Generate values of how wrong the other options would have been and then return the difference between the highest and this one. If we use the same option-evaluator as when acting, suprise-suprise, we always picked the highest value option. If we use a different one why are we not using that one in the other? Every situation being wrongness zero means where we improve is arbitrary.
So is intrinsic curiocity a reinforcement learning or unsupervised learning approach?
Intrinsic curiosity uses reinforcement learning to find places where the map is missing information, and then unsupervised learning to include that information in the map.
So is intrinsic curiocity a reinforcement learning or unsupervised learning approach?
With comparing claims of different models in the same expression language you do not need to have a dynamic model of inconsistency.
If you need the model to decide about itself where it is wrong, there is the possiblity that the model ,which can be dynamic, is poor quality about it.
What if you are wrong about what is wrong?
Suppose we are making an inconsistency detector by choosement. Consider all our previous situations. Then generate a value representing how wrong that choice was in that situation. Then stick with the one that has highest wrongness and start “here we need to improve” whatever that means.
So at a given situation how wrong was the actual choice made? Generate values of how wrong the other options would have been and then return the difference between the highest and this one. If we use the same option-evaluator as when acting, suprise-suprise, we always picked the highest value option. If we use a different one why are we not using that one in the other? Every situation being wrongness zero means where we improve is arbitrary.
Intrinsic curiosity uses reinforcement learning to find places where the map is missing information, and then unsupervised learning to include that information in the map.