The problem is that due to the VN bottlneck, to reach that performance those 300 GPUs need to be parallelizing 1000x over some problem dimension (matrix-matrix multiplication), they can’t actually just do the obvious thing you’d want—which is to simulate a single large brain (sparse RNN) at high speed (using vector-matrix multiplication). Trying that you’d just get 1 brain at real-time speed at best (1000x inefficiency/waste/slowdown). It’s not really an interconnect issue per se, it’s the VN bottleneck.
So you have to sort of pick your poison:
Parallelize over spatial dimension (CNNs) - too highly constraining for higher brain regions
Parallelize over batch/agent dimension—costly in RAM for agent medium-term memory, unless compressed somehow
Parallelize over time (transformers) - does enable huge speedup while being RAM efficient, but also highly constraining by limiting recursion
The largest advances in DL (the CNN revolution, the transformer revolution) are actually mostly about navigating this VN bottleneck, because more efficient use of GPUs trumps other considerations.
The problem is that due to the VN bottlneck, to reach that performance those 300 GPUs need to be parallelizing 1000x over some problem dimension (matrix-matrix multiplication), they can’t actually just do the obvious thing you’d want—which is to simulate a single large brain (sparse RNN) at high speed (using vector-matrix multiplication). Trying that you’d just get 1 brain at real-time speed at best (1000x inefficiency/waste/slowdown). It’s not really an interconnect issue per se, it’s the VN bottleneck.
So you have to sort of pick your poison:
Parallelize over spatial dimension (CNNs) - too highly constraining for higher brain regions
Parallelize over batch/agent dimension—costly in RAM for agent medium-term memory, unless compressed somehow
Parallelize over time (transformers) - does enable huge speedup while being RAM efficient, but also highly constraining by limiting recursion
The largest advances in DL (the CNN revolution, the transformer revolution) are actually mostly about navigating this VN bottleneck, because more efficient use of GPUs trumps other considerations.