Very interesting question!
My observations:
12 OOMs is a lot
12 OOMs is like comparing computational capacity of someone who just started learning arithmetics to a computer that an average person could obtain
You mainly focused on modern AI approaches and how they would scale. With 12 OOMs, it’s possible that other approaches would be used. See galactic algorithms for example.
It might be useful to look at this through the lens of computational complexity. Here’s a table of expected speedups:
Problem Complexity | Scaling |
---|---|
O(log(n)) | 10^10¹² |
O(n), O(n log(n)) | 10^12 |
O(n^2) | 10^6 |
O(n^3) | 10^4 |
O(n^4) | 10^3 |
O(n^6) | 10^2 |
O(n^12) | 10 |
O(10^n) | 12 |
Hence, we should expect exponentially complex brute force problems to perform 10 times better. With this, you’d only need 1 year to simulate something that would have taken 10 years. Quite nice for problems like quantum material design and computational biology.
On the other hand, neural networks are far more simple in terms of computation. A generous upper bound is O(n^5) for training and prediction. So if you spend a little more on compute, and get 13 OOMs speedup (just get 10 of those magical machines instead of 1), you should expect to be able to train neural networks 1000 times faster. What previously took you a week, now takes you a mere 10 minutes.
Aside: There is a problem here. O(n^5) is still slow. In part this is because biological brains work on different principles than digital computers. Neural networks are simple mathematically, but they were not designed for digital computers. It’s a simulation, and those are always slower than a real thing. Hence, I’d expect TAI to happen once this architecture mismatch would be fixed.
Furthermore, there are a lot of things that are swept under the rug when you imagine that compute (and everything related) is simply scaled 12 OOMs. For example, not all problems are parallelizable (though intelligence is—after all, biological neurons don’t need to wait until others have updated their state). Similarly, different problems might have different memory complexity.
This is why we have seen success in using specialized hardware for neural network training and execution.</Aside>
Now, improved run times would give you more freedom to tweak and experiment with meta parameters—adjusting network architecture, automatically categorizing all possible quantum materials and DNA sequence expressions. What would have taken you 100 years would now only take 10 years. That’s quite transformative. As a bonus, now such “100 years projects” would be able to fit into one’s research career. Not that different from current experiments in nuclear physics. This should open quite a lot of possibilities which we simply don’t have access to now.
Actually, this gives me an idea. The reason we can have such long-running and expensive projects like say LHC is that we have a clear theory of what we expect to get and what energies we need for that. If we were close to getting TAI (or any other computationally expensive product) and would know that, we would be able to collect funds to construct machine that would be “LHC of TAI”.
Why limit ourselves to our planet? 12 OOMs is well within reason if we were a type 2 civilization and had access to all the energy from our sun (our planet as a whole only receives 1/10^10 of it).
Encryption wouldn’t really be an issue—we can simply tune our algorithms to use slightly more complicated assumptions. After all, one can just pick a problem that scales as O(10^(6n)), where n could for example be secret key length. If you have 12 orders of magnitude more compute, just make your key 2 times larger and you still have your cryptography.
Thought of how small computers (phones etc) would scale also came to me. Basically, with 12 OOMs every phone becomes a computer powerful enough to train something as complicated as GPT-3. Everyone could carry their own personalized GPT-3 model with them. Actually, this is another way to improve AI performance—reduce the amount of things it needs to model. Training a personalized model specific for one problem would be cheaper and require less parameters/layers to get useful results.
Basically, we would be able to put a “small” specialized model with power like that of GPT-3 on every microcontroller.
You mentioned deep fakes. But with this much compute, why not “deepfake” brand new networks from scratch? Doing such experiments right now is expensive since they roughly quadruple the amount of computational resources needed to achieve this “second order” training mode.
Theoretically, there’s nothing preventing one from constructing a network that can assemble new networks based on training parameters. This meta-network could take into account network structure that it “learned” from training smaller models for other applications in order to generate new models with order of magnitude less parameters.
As an analogy, compare the amount of data a neural network needs to learn to differentiate between cats and dogs vs the amount of data a human needs to learn the same thing. Human only needs a couple of examples, while neural networks needs dozens of hundreds of examples just to learn the concept of shapes and topology.