A previous calculation on LW gave 2.4 x 10^24 for AlphaStar (using values from the original alphastar blog post) which suggested that the trend was roughly on track.
The differences between the 2 calculations are (your values first):
Agents: 12 vs 600
Days: 44 vs 14
TPUs: 32 vs 16
Utilisation: 33% vs 50% (I think this is just estimated in the other calculation)
I appreciate questioning of my calculations, thanks for checking!
This is what I think about the previous avturchin calculation: I think that may have been a misinterpretation of DeepMind blogpost. In the blogpost they say “The AlphaStar league was run for 14 days, using 16 TPUs for each agent”. But I think it might not be 16 TPU-days for each agent, it’s 16 TPU for 14/n_agent=14/600 days for each agent. And 14 days was for the whole League training where agent policies were trained consecutively. Their wording is indeed not very clear but you can look at the “Progression of Nash of AlphaStar League” pic. You can see there that, as they say, “New competitors were dynamically added to the league, by branching from existing competitors”, and that the new ones drastically outperform older ones, meaning that older ones were not continuously updated and were only randomly picked up as static opponents.
From the blogpost: “A full technical description of this work is being prepared for publication in a peer-reviewed journal”. The only publication about this is their late-2019 Nature paper linked by teradimich here which I have taken the values from. They have upgraded their algorithm and have spent more compute in a single experiment by October 2019. 12 agents refers to the number of types of agents and 600 (900 in the newer version) refers to the number of policies. About the 33% GPU utilization value—I think I’ve seen it in some ML publications and in other places for this hardware, and this seems like a reasonable estimate for all these projects, but I don’t have sources at hand.
When we didn’t have enough information to directly count FLOPs, we looked GPU training time and total number of GPUs used and assumed a utilization efficiency (usually 0.33)
We trained the league using three main agents (one for each StarCraft race), three main exploiter agents (one for each race), and six league exploiter agents (two for each race). Each agent was trained using 32 third-generation tensor processing units (TPUs) over 44 days
A previous calculation on LW gave 2.4 x 10^24 for AlphaStar (using values from the original alphastar blog post) which suggested that the trend was roughly on track.
The differences between the 2 calculations are (your values first):
Agents: 12 vs 600
Days: 44 vs 14
TPUs: 32 vs 16
Utilisation: 33% vs 50% (I think this is just estimated in the other calculation)
Do you have a reference for the values you use?
I appreciate questioning of my calculations, thanks for checking!
This is what I think about the previous avturchin calculation: I think that may have been a misinterpretation of DeepMind blogpost. In the blogpost they say “The AlphaStar league was run for 14 days, using 16 TPUs for each agent”. But I think it might not be 16 TPU-days for each agent, it’s 16 TPU for 14/n_agent=14/600 days for each agent. And 14 days was for the whole League training where agent policies were trained consecutively. Their wording is indeed not very clear but you can look at the “Progression of Nash of AlphaStar League” pic. You can see there that, as they say, “New competitors were dynamically added to the league, by branching from existing competitors”, and that the new ones drastically outperform older ones, meaning that older ones were not continuously updated and were only randomly picked up as static opponents.
From the blogpost: “A full technical description of this work is being prepared for publication in a peer-reviewed journal”. The only publication about this is their late-2019 Nature paper linked by teradimich here which I have taken the values from. They have upgraded their algorithm and have spent more compute in a single experiment by October 2019. 12 agents refers to the number of types of agents and 600 (900 in the newer version) refers to the number of policies. About the 33% GPU utilization value—I think I’ve seen it in some ML publications and in other places for this hardware, and this seems like a reasonable estimate for all these projects, but I don’t have sources at hand.
Probably that:
This can be useful: