Ohh OK I think since I wrote “512 TPU cores” it’s 512x512, because in Appendix C here https://arxiv.org/pdf/1809.11096.pdf they say it corresponds to 512x512.
Deep or shallow version?
“Training takes between 24 and 48 hours for most models”; I assumed both are trained within 48 hours (even though this is not precise and may be incorrect).
Ohh OK I think since I wrote “512 TPU cores” it’s 512x512, because in Appendix C here https://arxiv.org/pdf/1809.11096.pdf they say it corresponds to 512x512.
Deep or shallow version?
“Training takes between 24 and 48 hours for most models”; I assumed both are trained within 48 hours (even though this is not precise and may be incorrect).