Jsevillamol comments on EfficientZero: How It Works

Jsevillamol 26 Nov 2021 16:07 UTC
3 points
Great post!
Do you mind if I ask you what is the amount of free parameters and training compute of EfficientZero?
I tried scanning the paper but didn’t find them readily available.
- Tom Lieberum 26 Nov 2021 17:35 UTC
  2 points
  Parent
  In appendix A.6 they state “To train an Atari agent for 100k steps, it only needs 4 GPUs to train 7 hours.” I don’t think they provide a summary of the total number of parameters. Scanning the described architecture though, it does not look like a lot—almost surely < 1B.
  - gwern 27 Nov 2021 21:42 UTC
    8 points
    Parent
    A single ALE game is just not that complex, so ALE models are never large. (Uber once did an interesting paper on quantifying how small the NNs could be for ALE games.) The parameter count will be much closer to 1m than 1b. You can also look at the layer types & sizes in Appendix A: even without trying to calculate out anything, with a few convolution layers and a normal-sized LSTM layer and not much else, there’s simply no way it’s anywhere near 1b.
    
    (As is pretty much always the case. Models in DRL are usually small, especially compared to supervised/self-supervised stuff. I’m not sure there’s even a single ‘pure DRL’ model which cracks the 1b scale. The biggest chonkers might be like, MetaMimic or AlphaZero or AlphaStar, which would be in the low hundreds? So that’s probably why DRL papers are not in the habit of reporting parameter counts everywhere or scaling them up/down like you might assume these days. That’ll have to change as more self-supervised models are used.)