Yeah, if you train the algorithm by random sampling, the effect I described will take place. The same thing will happen if you use an RL algorithm to update the parameters instead of an unsupervised learning algorithm(though it seems willfully perverse to do so—you’re throwing away a lot of the structure of the problem by doing this, so training will be much slower)
I also just found an old comment which makes the exact same argument I made here. (Though it now seems to me that argument is not necessarily correct!)
Yeah, if you train the algorithm by random sampling, the effect I described will take place. The same thing will happen if you use an RL algorithm to update the parameters instead of an unsupervised learning algorithm(though it seems willfully perverse to do so—you’re throwing away a lot of the structure of the problem by doing this, so training will be much slower)
I also just found an old comment which makes the exact same argument I made here. (Though it now seems to me that argument is not necessarily correct!)