Razied answers Is the speed of training large models going to increase significantly in the near future due to Cerebras Andromeda?

Razied 16 Nov 2022 0:49 UTC
1 point
0
Well, it will scale linearly until it hits the finite node-to-node bandwidth limit… just like all other supercomputers. If you have your model training on $n$ different nodes, you still need to share all your weights with all other nodes at some point, which is fundamentally an $O (n^{2})$ operation, it just appears linear when you’re spending more time computing your weight updates than you are communicating with other nodes. I don’t see this really being a qualitative jump, but it might well be one more point to add to the graph of increasing compute power dedicated to AI.
- Zach Furman 16 Nov 2022 1:25 UTC
  8 points
  0
  Parent
  Hmm, I see how that would happen with other architectures, but I’m a bit confused how this is $O (n^{2})$ here? Andromeda has the weight updates computed by a single server (MemoryX) and then distributed to all the nodes. Wouldn’t this be a one-to-many broadcast with $O (n)$ transmission time?
  - Razied 16 Nov 2022 2:24 UTC
    3 points
    0
    Parent
    You’re completely right, I don’t know how I missed that, I must be more tired than I thought I was.