Thanks for the write-up! I find the nomenclature “target network” confusing, but I understand that comes from the literature. I like to think of it as “slightly out-of-date reward prediction net”. Also, it’s still not clear what “Q” stands for in DQN. Do you know?
Looking forward to your future deep learning write-ups :-)
“Slightly out-of-date reward prediction net” is a good way of thinking about it. It’s called the “target network” because it’s used to produce targets for the loss function—the main neural network is trying to get as close to the target prediction as possible, and the target prediction is built by the target network. Something like the “stable network” or “prior network” might have been a better term though.
“Q” in DQN is a reference to Q-learning. I’m not 100% sure, but I believe the Q term in Q-learning is supposed to be short for “quality”—a Q-function calculates the quality of a state-action combination.
Thanks for the write-up!
I find the nomenclature “target network” confusing, but I understand that comes from the literature. I like to think of it as “slightly out-of-date reward prediction net”.
Also, it’s still not clear what “Q” stands for in DQN. Do you know?
Looking forward to your future deep learning write-ups :-)
Thanks for the feedback—glad you liked the post!
“Slightly out-of-date reward prediction net” is a good way of thinking about it. It’s called the “target network” because it’s used to produce targets for the loss function—the main neural network is trying to get as close to the target prediction as possible, and the target prediction is built by the target network. Something like the “stable network” or “prior network” might have been a better term though.
“Q” in DQN is a reference to Q-learning. I’m not 100% sure, but I believe the Q term in Q-learning is supposed to be short for “quality”—a Q-function calculates the quality of a state-action combination.