sanil comments on Fun with +12 OOMs of Compute

sanil 13 Jan 2023 10:39 UTC
3 points
Very interesting question!

My observations:
1. 12 OOMs is a lot
2. 12 OOMs is like comparing computational capacity of someone who just started learning arithmetics to a computer that an average person could obtain
3. You mainly focused on modern AI approaches and how they would scale. With 12 OOMs, it’s possible that other approaches would be used. See galactic algorithms for example.
4. It might be useful to look at this through the lens of computational complexity. Here’s a table of expected speedups:
Problem Complexity Scaling

O(log(n)) 10^10¹²

O(n), O(n log(n)) 10^12

O(n^2) 10^6

O(n^3) 10^4

O(n^4) 10^3

O(n^6) 10^2

O(n^12) 10

O(10^n) 12

Hence, we should expect exponentially complex brute force problems to perform 10 times better. With this, you’d only need 1 year to simulate something that would have taken 10 years. Quite nice for problems like quantum material design and computational biology.

On the other hand, neural networks are far more simple in terms of computation. A generous upper bound is O(n^5) for training and prediction. So if you spend a little more on compute, and get 13 OOMs speedup (just get 10 of those magical machines instead of 1), you should expect to be able to train neural networks 1000 times faster. What previously took you a week, now takes you a mere 10 minutes.

Aside: There is a problem here. O(n^5) is still slow. In part this is because biological brains work on different principles than digital computers. Neural networks are simple mathematically, but they were not designed for digital computers. It’s a simulation, and those are always slower than a real thing. Hence, I’d expect TAI to happen once this architecture mismatch would be fixed.

Furthermore, there are a lot of things that are swept under the rug when you imagine that compute (and everything related) is simply scaled 12 OOMs. For example, not all problems are parallelizable (though intelligence is—after all, biological neurons don’t need to wait until others have updated their state). Similarly, different problems might have different memory complexity.

This is why we have seen success in using specialized hardware for neural network training and execution.</Aside>

Now, improved run times would give you more freedom to tweak and experiment with meta parameters—adjusting network architecture, automatically categorizing all possible quantum materials and DNA sequence expressions. What would have taken you 100 years would now only take 10 years. That’s quite transformative. As a bonus, now such “100 years projects” would be able to fit into one’s research career. Not that different from current experiments in nuclear physics. This should open quite a lot of possibilities which we simply don’t have access to now.

Actually, this gives me an idea. The reason we can have such long-running and expensive projects like say LHC is that we have a clear theory of what we expect to get and what energies we need for that. If we were close to getting TAI (or any other computationally expensive product) and would know that, we would be able to collect funds to construct machine that would be “LHC of TAI”.

Problem Complexity	Scaling
O(log(n))	10^10¹²
O(n), O(n log(n))	10^12
O(n^2)	10^6
O(n^3)	10^4
O(n^4)	10^3
O(n^6)	10^2
O(n^12)	10
O(10^n)	12