Marius Hobbhahn has estimated the number of parameters here. His final estimate is 3.5e6 parameters.
Anson Ho has estimated the training compute (his reasoning at the end of this answer). His final estimate is 7.8e22 FLOPs.
Below I made a visualization of the parameters vs training compute of n=108 important ML system, so you can see how DeepMind’s syste (labelled GOAT in the graph) compares to other systems.
[Hardware] - “Each agent is trained using 8 TPUv3s and consumes approximately 50,000 agent steps (observations) per second.” - TPUv3 (half precision): 4.2e14 FLOP/s - Number of TPUs: 8 - Utilisation rate: 0.1
[Timesteps] - Figure 16 shows steps per generation and agent. In total there are 1.5e10 + 4.0e10 + 2.5e10 + 1.1e11 + 2e11 = 3.9e11 steps per agent. - 3.9e11 / 5e4 = 8e6 s → ~93 days - 100 million steps is equivalent to 30 minutes of wall-clock time in our setup. (pg 29, fig 27) - 1e8 steps → 0.5h - 3.9e11 steps → 1950h → 7.0e6 s → ~82 days - Both of these seem like overestimates, because: “Finally, on the largest timescale (days), generational training iteratively improves population performance by bootstrapping off previous generations, whilst also iteratively updating the validation normalised percentile metric itself.” (pg 16) - Suggests that the above is an overestimate of the number of days needed, else they would have said (months) or (weeks)? - Final choice (guesstimate): 85 days = 7.3e6 s
[Population size] - 8 agents? (pg 21) → this is describing the case where they’re not using PBT, so ignore this number - The original PBT paper uses 32 agents for one task https://arxiv.org/pdf/1711.09846.pdf (in general it uses between 10 and 80) - (Guesstimate) Average population size: 32
Marius Hobbhahn has estimated the number of parameters here. His final estimate is 3.5e6 parameters.
Anson Ho has estimated the training compute (his reasoning at the end of this answer). His final estimate is 7.8e22 FLOPs.
Below I made a visualization of the parameters vs training compute of n=108 important ML system, so you can see how DeepMind’s syste (labelled GOAT in the graph) compares to other systems.
Thanks so much! So, for comparison, fruit flies have more synapses than these XLAND/GOAT agents have parameters! https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons