SGD is not going to play the future forward to see the new feedback mechanism you’ve described and incorporate it into the loss function which is being minimized
My ‘new feedback mechanism’ is part of the training procedure. It’s not going to be good at that by ‘playing the future forward’, it’s going to become good at that by being trained on it.
I suspect we’re using SGD in different ways, because everything we’ve talked about seems like it could be implemented with SGD. Do you agree that letting the Predict-O-Matic predict the future and rewarding it for being right, RL-style, would lead to it finding fixed points? Because you can definitely use SGD to do RL (first google result).
I suspect we’re using SGD in different ways, because everything we’ve talked about seems like it could be implemented with SGD. Do you agree that letting the Predict-O-Matic predict the future and rewarding it for being right, RL-style, would lead to it finding fixed points? Because you can definitely use SGD to do RL (first google result).
Fair enough, I was thinking about supervised learning.
My ‘new feedback mechanism’ is part of the training procedure. It’s not going to be good at that by ‘playing the future forward’, it’s going to become good at that by being trained on it.
I suspect we’re using SGD in different ways, because everything we’ve talked about seems like it could be implemented with SGD. Do you agree that letting the Predict-O-Matic predict the future and rewarding it for being right, RL-style, would lead to it finding fixed points? Because you can definitely use SGD to do RL (first google result).
Fair enough, I was thinking about supervised learning.