“Slightly out-of-date reward prediction net” is a good way of thinking about it. It’s called the “target network” because it’s used to produce targets for the loss function—the main neural network is trying to get as close to the target prediction as possible, and the target prediction is built by the target network. Something like the “stable network” or “prior network” might have been a better term though.
“Q” in DQN is a reference to Q-learning. I’m not 100% sure, but I believe the Q term in Q-learning is supposed to be short for “quality”—a Q-function calculates the quality of a state-action combination.
Thanks for the feedback—glad you liked the post!
“Slightly out-of-date reward prediction net” is a good way of thinking about it. It’s called the “target network” because it’s used to produce targets for the loss function—the main neural network is trying to get as close to the target prediction as possible, and the target prediction is built by the target network. Something like the “stable network” or “prior network” might have been a better term though.
“Q” in DQN is a reference to Q-learning. I’m not 100% sure, but I believe the Q term in Q-learning is supposed to be short for “quality”—a Q-function calculates the quality of a state-action combination.