Julian Schrittwieser comments on How much compute was used to train DeepMind’s generally capable agents?

Julian Schrittwieser 30 Jul 2021 20:02 UTC
3 points
Only Anakin actually runs the environment on the TPU, and this only works for pretty simple environments (basically: can you implement it in JAX?) Sebulba runs environments on the host, which is what would have been done for this paper too (no idea if they used Sebulba or had a different setup).
This doesn’t really matter though, because for these simulated environments it’s fairly simple to fully utilize the TPUs by running more (remote) environments in parallel.
- gwern 31 Jul 2021 1:32 UTC
  2 points
  Parent
  Yes, I see that they used Unity, so the TPUs themselves couldn’t run the env, but the TPU CPU VM* could run potentially a lot of copies (with that like 300GB of RAM it’s got access to), and that’d be a lot nicer than running remote VMs. At least in Tensorfork, when we try to use TPU pods, a lot of time goes into figuring out correct use of the interconnect & traffic because the on-TPU ops are so optimized by default.
  
  (And regardless of which of those tricks this open-ended paper uses, this is a point well worth knowing about how research could potentially gets way more performance out of a TPU pod than one would expect from knowing TPU usage of old stuff like AlphaStar.)
  
  * advertisement: access to the VM was recently unlocked for non-Google TPU users. It really changes how you treat TPU use!