dumky comments on Deep Q-Networks Explained

dumky 20 Sep 2022 20:04 UTC
2 points
0
Thanks for the write-up!
I find the nomenclature “target network” confusing, but I understand that comes from the literature. I like to think of it as “slightly out-of-date reward prediction net”.
Also, it’s still not clear what “Q” stands for in DQN. Do you know?

Looking forward to your future deep learning write-ups :-)
- Jay Bailey 21 Sep 2022 1:23 UTC
  1 point
  0
  Parent
  Thanks for the feedback—glad you liked the post!
  
  “Slightly out-of-date reward prediction net” is a good way of thinking about it. It’s called the “target network” because it’s used to produce targets for the loss function—the main neural network is trying to get as close to the target prediction as possible, and the target prediction is built by the target network. Something like the “stable network” or “prior network” might have been a better term though.
  
  “Q” in DQN is a reference to Q-learning. I’m not 100% sure, but I believe the Q term in Q-learning is supposed to be short for “quality”—a Q-function calculates the quality of a state-action combination.