I’d be curious to hear more thoughts on how much we could already scale it right now. Looks like that data might be a bottleneck?
Some thoughts on compute:
Gato estimate: 256 TPUv3 chips for 4 days a 24hours = 24′574 TPUv3-hours (on-demand costs are $2 per hour for a TPUv3) =$49′152
In comparison, PaLM used 8′404′992 TPUv4 hours and I estimated that it’d cost $11M+. If we’d assume that someone would be willing to spend the same compute budget on it, we could make the model 106x bigger (assuming Chinchilla scaling laws). Also tweeted about this here.
The size of the model was only(?) limited due to latency requirements for the robotics part.
I’d be curious to hear more thoughts on how much we could already scale it right now. Looks like that data might be a bottleneck?
Some thoughts on compute:
Gato estimate: 256 TPUv3 chips for 4 days a 24hours = 24′574 TPUv3-hours (on-demand costs are $2 per hour for a TPUv3) =$49′152
In comparison, PaLM used 8′404′992 TPUv4 hours and I estimated that it’d cost $11M+. If we’d assume that someone would be willing to spend the same compute budget on it, we could make the model 106x bigger (assuming Chinchilla scaling laws). Also tweeted about this here.
The size of the model was only(?) limited due to latency requirements for the robotics part.