What I don’t understand is why you can’t make a cortex-like GPU area by using a huge power supply over a wide area. GPUs already use some parallel processing, and this would just magnify that, no?
You certainly could spend the same power budget on an equivalent surface area of GPU circuitry. It would be about a million times slower though if you work out the math. (A GPU simulation of a cortex could run at maybe normal human speed)
A GPU is still a von Neumann machine. It separates memory and computation. The memory is stored in one 2D grid and must be moved over to another 2D grid to do anything. If you work out the geometry, the maximum amount of memory you can move in one clock cycle thus scales with the square root of the total memory size. That’s the ideal case for a massively fat direct pipe.
In reality current high end GPU memory systems can move/access only about 100 bytes out of a billion every clock cycle (10^-7 ratio) which works out to 10 megabytes per clock cycle for 100,000 GPUs (to match the brain’s 100 terrabytes).
The cortex is based on a radically different architecture where memory and computation are unified. It can thus locally compute on more or less all of it’s memory every clock cycle − 100 terrabytes per clock cycle.
Some of the memory bottleneck gap could be closed by clever algorithms that prioritize the memory space and store most of it in flash memory. That would maybe reduce the problem such that you would need only 1000 GPUs instead of 100, but it wouldn’t dramatically increase the simulation speed.
Another consideration is that GPU circuit area is about 100 times more expensive per unit of surface area than flash memory. If memristors work out they should have flash memory economics.
Violating the von Neumann architecture is the general trend that silicon computer design is going, yes. However, the requirements of backwards compatibility are fierce, and it may even be fundamentally harder to program multicore or “distributed” architectures.
Think of the separation of CPU and memory as a lie—“spatial geometry does not exist, everything is one pointer reference away from everything else”—and perhaps it makes sense why programmers find single-core easier to write code for. The lie is a useful abstraction.
One of the major differences between evolved and human-designed implementations is that evolved implementations take an amount of time and effort roughly proportional to the size of the solution, but human-designed implementations usually take time and effort proportional to the size of the argument that the solution is correct. (Of course, it’s entirely possible to find edge cases that blur this distinction, but you understand what I’m gesturing at.)
That means that evolution (or other weak solution-space search methods) may have an advantage at finding solutions that are hard to explain why they work. (At least, until humans develop a well-known and general theory that they can reference in their designs.)
Depends on what organizational principles you are talking about. At the very generic level, the brain’s extremely distributed architecture is already the direction computers are steadily moving towards out of necessity (with supercomputers farther ahead).
As for memristors, they will probably have many generic uses. They just also happen to be very powerful for cortex-like AGI applications.
The cortical model for an analog AGI circuit would be incredibly fast but it would be specific for AGI applications (which of course is still quite broad in scope). For regular computation you’d still use digital programmable chips.
Have the distributed architecture trends and memristor applications followed the rough path you expected when you wrote this 12 years ago? Is this or this the sort of thing you were gesturing at? Do you have other links or keywords I could search for?
The distributed arch prediction with supercomputers farther ahead was correct—nvidia grew from a niche gaming company to eclipse intel and is on some road to stock market dominance all because it puts old parallel supercomputers on single chips.
Neuromorphic computing in various forms are slowly making progress: there’s IBM’s truenorth research chip for example, and a few others. Memristors were overhyped and crashed, but are still in research and may yet come to be.
So instead we got big GPU clusters, which for the reasons explained in the article can’t run large brain like RNNs at high speeds, but they can run smaller transformer-models (which sacrifice recurrence and thus aren’t as universal, but are still pretty general) at very high speeds (perhaps 10000x) - and that is what gave us GPT4. The other main limitation of transformers vs brain-like RNNs is GPUs only massive accelerate transformer training, not inference. Some combination of those two limitations seems to be the main blockers for AGI at current training compute regime, but probably won’t last long.
This story did largely get one aspect of AGI correct and for the right reasons—that its early large economic advantage will be in text generation and related fields, and perhaps the greatest early risk is via human influence.
What I don’t understand is why you can’t make a cortex-like GPU area by using a huge power supply over a wide area. GPUs already use some parallel processing, and this would just magnify that, no?
You certainly could spend the same power budget on an equivalent surface area of GPU circuitry. It would be about a million times slower though if you work out the math. (A GPU simulation of a cortex could run at maybe normal human speed)
A GPU is still a von Neumann machine. It separates memory and computation. The memory is stored in one 2D grid and must be moved over to another 2D grid to do anything. If you work out the geometry, the maximum amount of memory you can move in one clock cycle thus scales with the square root of the total memory size. That’s the ideal case for a massively fat direct pipe.
In reality current high end GPU memory systems can move/access only about 100 bytes out of a billion every clock cycle (10^-7 ratio) which works out to 10 megabytes per clock cycle for 100,000 GPUs (to match the brain’s 100 terrabytes).
The cortex is based on a radically different architecture where memory and computation are unified. It can thus locally compute on more or less all of it’s memory every clock cycle − 100 terrabytes per clock cycle.
Some of the memory bottleneck gap could be closed by clever algorithms that prioritize the memory space and store most of it in flash memory. That would maybe reduce the problem such that you would need only 1000 GPUs instead of 100, but it wouldn’t dramatically increase the simulation speed.
Another consideration is that GPU circuit area is about 100 times more expensive per unit of surface area than flash memory. If memristors work out they should have flash memory economics.
So the organizational principles of the cortex can be applied only to exact simulations of the cortex and not to any other sort of computations?
Violating the von Neumann architecture is the general trend that silicon computer design is going, yes. However, the requirements of backwards compatibility are fierce, and it may even be fundamentally harder to program multicore or “distributed” architectures.
Think of the separation of CPU and memory as a lie—“spatial geometry does not exist, everything is one pointer reference away from everything else”—and perhaps it makes sense why programmers find single-core easier to write code for. The lie is a useful abstraction.
One of the major differences between evolved and human-designed implementations is that evolved implementations take an amount of time and effort roughly proportional to the size of the solution, but human-designed implementations usually take time and effort proportional to the size of the argument that the solution is correct. (Of course, it’s entirely possible to find edge cases that blur this distinction, but you understand what I’m gesturing at.)
That means that evolution (or other weak solution-space search methods) may have an advantage at finding solutions that are hard to explain why they work. (At least, until humans develop a well-known and general theory that they can reference in their designs.)
Depends on what organizational principles you are talking about. At the very generic level, the brain’s extremely distributed architecture is already the direction computers are steadily moving towards out of necessity (with supercomputers farther ahead).
As for memristors, they will probably have many generic uses. They just also happen to be very powerful for cortex-like AGI applications.
The cortical model for an analog AGI circuit would be incredibly fast but it would be specific for AGI applications (which of course is still quite broad in scope). For regular computation you’d still use digital programmable chips.
Have the distributed architecture trends and memristor applications followed the rough path you expected when you wrote this 12 years ago? Is this or this the sort of thing you were gesturing at? Do you have other links or keywords I could search for?
The distributed arch prediction with supercomputers farther ahead was correct—nvidia grew from a niche gaming company to eclipse intel and is on some road to stock market dominance all because it puts old parallel supercomputers on single chips.
Neuromorphic computing in various forms are slowly making progress: there’s IBM’s truenorth research chip for example, and a few others. Memristors were overhyped and crashed, but are still in research and may yet come to be.
So instead we got big GPU clusters, which for the reasons explained in the article can’t run large brain like RNNs at high speeds, but they can run smaller transformer-models (which sacrifice recurrence and thus aren’t as universal, but are still pretty general) at very high speeds (perhaps 10000x) - and that is what gave us GPT4. The other main limitation of transformers vs brain-like RNNs is GPUs only massive accelerate transformer training, not inference. Some combination of those two limitations seems to be the main blockers for AGI at current training compute regime, but probably won’t last long.
This story did largely get one aspect of AGI correct and for the right reasons—that its early large economic advantage will be in text generation and related fields, and perhaps the greatest early risk is via human influence.