sanxiyn comments on InternLM—China’s Best (Unverified)

sanxiyn 10 Jun 2023 1:05 UTC
5 points
1
Parallelization part (data parallelism, tensor parallelism, pipeline parallelism, ZeRO) is completely standard. See Efficient Training on Multiple GPUs by Hugging Face for a standard description. Failure recovery part is relatively unusual.
- dr_s 11 Jun 2023 8:36 UTC
  2 points
  1
  Parent
  I don’t get what the parallelization strategy should have to do with the chip ban? It sounds like just a basic parallelism approach.
  - Lao Mein 11 Jun 2023 9:23 UTC
    3 points
    0
    Parent
    You’re right. I was pretty tired when I wrote this and am not sure where that thought came from.