Yeah. There’s no gradient descent within a single episode, but if you have a network with input (as always) and with memory (e.g. an RNN) then its behavior in any given episode can be a complicated function of its input over time in that episode, which you can describe as “it figured something out from the input and that’s now determining its further behavior”. Anyway, everything you said is right, I think.
Yeah. There’s no gradient descent within a single episode, but if you have a network with input (as always) and with memory (e.g. an RNN) then its behavior in any given episode can be a complicated function of its input over time in that episode, which you can describe as “it figured something out from the input and that’s now determining its further behavior”. Anyway, everything you said is right, I think.