A passage about AI safety from the blog: “We also think some of our training methods may prove useful in the study of safe and robust AI. One of the great challenges in AI is the number of ways in which systems could go wrong, and StarCraft pros have previously found it easy to beat AI systems by finding inventive ways to provoke these mistakes. AlphaStar’s innovative league-based training process finds the approaches that are most reliable and least likely to go wrong. We’re excited by the potential for this kind of approach to help improve the safety and robustness of AI systems in general, particularly in safety-critical domains like energy, where it’s essential to address complex edge cases.”
Also, they said that each agent used 16 TPU3 and the graph in the article indicates that at the end there were 600 agents. Based on TPU3 declared performance of 420 Teraflops, at the end it consumed 4 exaflops, with median 2 exaflops for 14 days, which is equal to 28.000 petaflops-days of compute. AlphaGoZero consumed 1800 petaflops-days according to OpenAI, but did it around 13 months before AlphaStar. This means that the trend of 3.5 months of doubling time of compute for most complex experiments continues.
AlphaStar’s innovative league-based training process finds the approaches that are most reliable and least likely to go wrong.
“Go wrong” is still tied to the game’s win condition. So while the league-based training process does find the set of agents whose gameplay is least exploitable (among all the agents they trained), it’s not obvious how this relates to problems in AGI safety such as goal specification or robustness to capability gains. Maybe they’re thinking of things like red teaming. But without more context I’m not sure how safety-relevant this is.
There’s also the CPU. Those <=200 years of SC2 simulations per agent aren’t free. OA5, recall, was ’256 GPUs and 128,000 CPU cores’. (Occasionally training a small NN update is easier than running many games necessary to get the experience to decide what tweak to make.)
A passage about AI safety from the blog: “We also think some of our training methods may prove useful in the study of safe and robust AI. One of the great challenges in AI is the number of ways in which systems could go wrong, and StarCraft pros have previously found it easy to beat AI systems by finding inventive ways to provoke these mistakes. AlphaStar’s innovative league-based training process finds the approaches that are most reliable and least likely to go wrong. We’re excited by the potential for this kind of approach to help improve the safety and robustness of AI systems in general, particularly in safety-critical domains like energy, where it’s essential to address complex edge cases.”
Also, they said that each agent used 16 TPU3 and the graph in the article indicates that at the end there were 600 agents. Based on TPU3 declared performance of 420 Teraflops, at the end it consumed 4 exaflops, with median 2 exaflops for 14 days, which is equal to 28.000 petaflops-days of compute. AlphaGoZero consumed 1800 petaflops-days according to OpenAI, but did it around 13 months before AlphaStar. This means that the trend of 3.5 months of doubling time of compute for most complex experiments continues.
“Go wrong” is still tied to the game’s win condition. So while the league-based training process does find the set of agents whose gameplay is least exploitable (among all the agents they trained), it’s not obvious how this relates to problems in AGI safety such as goal specification or robustness to capability gains. Maybe they’re thinking of things like red teaming. But without more context I’m not sure how safety-relevant this is.
There’s also the CPU. Those <=200 years of SC2 simulations per agent aren’t free. OA5, recall, was ’256 GPUs and 128,000 CPU cores’. (Occasionally training a small NN update is easier than running many games necessary to get the experience to decide what tweak to make.)