For example, if there were an image-processing algorithm that used many fewer operations overall, but where those operations were more serial and less parallel—e.g. it required 1000 sequential steps for each image—then I think evolution would not have found it, because brains are too slow.
So then you need a different reason to think that such an algorithm doesn’t exist.
Maybe you can say “If such an algorithm existed, AI researchers would have found it by now.” But would they really? If AI researchers hadn’t been stealing ideas from the brain, would they have even invented neural nets by now? I dunno.
Or you can say “Something about the nature of image processing is that doing 1000 sequential steps just isn’t that useful for the task.” I guess I find that claim kinda plausible, but I’m just not very confident, I don’t feel like I have such a deep grasp of the fundamental nature of image processing that I can make claims like that.
In other domains besides image processing, I’d be even less confident. For example, I can kinda imagine some slightly-alien form of “reasoning” or “planning” that was mostly like human “reasoning” or “planning” but sometimes involved fast serial operations. After all, I find it very handy to have a fast serial laptop. If access to fast serial processing is useful for “me”, maybe it would be also useful for the low-level implementation of my brain algorithms. I dunno. Again, I think it’s hard to say either way.
For example, if there were an image-processing algorithm that used many fewer operations overall, but where those operations were more serial and less parallel—e.g. it required 1000 sequential steps for each image—then I think evolution would not have found it, because brains are too slow.
EDIT: I updated the circuits section of the article with an improved model of Serial vs Parallel vs Neurmorphic(PIM) scalability, which better illustrates how serial computation doesn’t scale.
Yes you bring up a good point, and one I should have discussed in more detail (but the article is already pretty long). However the article does provide part of the framework to answer this question.
There definitely are serial/parallel tradeoffs where the parallel version of an algorithm tends to use marginally more compute asymptotically. However these simple big O asymptotic models do not consider the fundamental costs of wire energy transit for remote memory accesses, which actually scale as M(1/2) for 2D memory. So in that sense the simple big O models are asymptotically wrong. If you use the correct more detailed models which account for the actual wire energy costs, everything changes, and the parallel versions leveraging distributed local memory and thus avoiding wire energy transit are generally more energy efficient—but through using a more memory heavy algorithmic approach.
Another way of looking at it is to compare serial-optimized VN processors (CPUs) vs parallel-optimized VN processors (GPUs), vs parallel processor-in-memory (brains, neuromorphic).
Pure serial CPUs (ignoring parallel/vector instructions) with tens of billions of transistors have only order a few dozen cores but not much higher clock rates than GPUs, despite using all that die space for marginal serial speed increase—serial speed scales extremely poorly with transistor density, end of dennard scaling, etc. A GPU with tens of billions of transistors instead has tens of thousands of ALU cores, but is still ultimately limited by very slow poor scaling of off-chip RAM bandwidth proportional to N0.5 (where N is device area), and wire energy that doesn’t scale at all. The neuromorphic/PIM machine has perfect mem bandwidth scaling at 1:1 ratio—it can access all of it’s RAM per clock cycle, pays near zero energy to access RAM (as memory and compute are unified), and everything scales linear with N.
Physics is fundamentally parallel, not serial, so the latter just doesn’t scale.
But of course on top of all that there is latency/delay—so for example the brain is also strongly optimized for minimal depth for minimal delay, and to some extent that may compete with optimizing for energy. Ironically delay is also a problem in GPU ANNs—huge problem for tesla’s self driving cars for example—because GPUs need to operate on huge batches to amortize their very limited/expensive memory bandwidth.
Yeah, latency / depth is the main thing I was thinking of.
If my boss says “You must calculate sin(x) in 2 clock cycles”, I would have no choice but to waste a ton of memory on a giant lookup table. (Maybe “2″ is the wrong number of clock cycles here, but you get the idea.) If I’m allowed 10 clock cycles, maybe I can reduce x mod 2π first, and thus use a much smaller lookup table, thus waste a lot less memory. If I’m allowed 200 clock cycles to calculate sin(x), I can use C code that has no lookup table at all, and thus roughly zero memory and communications. (EDIT: Oops, LOL, the C code I linked uses a lookup table. I could have linked this one instead.)
So I still feel like I don’t want to take it for granted that there’s a certain amount of “algorithmic work” that needs to be done for “intelligence”, and that amount of “work” is similar to what the human brain uses. I feel like there might be potential algorithmic strategies out there that are just out of the question for the human brain, because of serial depth. (Among other reasons.)
Also, it’s not all-or-nothing: I can imagine an AGI that involves a big parallel processor, and a small fast serial coprocessor. Maybe there are little pieces of the algorithm that would massively benefit from serialization, and the brain is bottlenecked in capability (or wastes memory / resources) by the need to find workarounds for those pieces. Or maybe not, who knows.
For example, if there were an image-processing algorithm that used many fewer operations overall, but where those operations were more serial and less parallel—e.g. it required 1000 sequential steps for each image—then I think evolution would not have found it, because brains are too slow.
So then you need a different reason to think that such an algorithm doesn’t exist.
Maybe you can say “If such an algorithm existed, AI researchers would have found it by now.” But would they really? If AI researchers hadn’t been stealing ideas from the brain, would they have even invented neural nets by now? I dunno.
Or you can say “Something about the nature of image processing is that doing 1000 sequential steps just isn’t that useful for the task.” I guess I find that claim kinda plausible, but I’m just not very confident, I don’t feel like I have such a deep grasp of the fundamental nature of image processing that I can make claims like that.
In other domains besides image processing, I’d be even less confident. For example, I can kinda imagine some slightly-alien form of “reasoning” or “planning” that was mostly like human “reasoning” or “planning” but sometimes involved fast serial operations. After all, I find it very handy to have a fast serial laptop. If access to fast serial processing is useful for “me”, maybe it would be also useful for the low-level implementation of my brain algorithms. I dunno. Again, I think it’s hard to say either way.
Peter Watts would like you to ponder how Portia spiders think about what they see. :)
Is that link safe to click for someone with Arachnophobia?
no pictures
Yes. Photos are a lot of work to include, and anyway, jumping spiders are famously cute (as far as spiders go).
I wish the cuteness made a difference. Interesting reading though, thanks.
EDIT: I updated the circuits section of the article with an improved model of Serial vs Parallel vs Neurmorphic(PIM) scalability, which better illustrates how serial computation doesn’t scale.
Yes you bring up a good point, and one I should have discussed in more detail (but the article is already pretty long). However the article does provide part of the framework to answer this question.
There definitely are serial/parallel tradeoffs where the parallel version of an algorithm tends to use marginally more compute asymptotically. However these simple big O asymptotic models do not consider the fundamental costs of wire energy transit for remote memory accesses, which actually scale as M(1/2) for 2D memory. So in that sense the simple big O models are asymptotically wrong. If you use the correct more detailed models which account for the actual wire energy costs, everything changes, and the parallel versions leveraging distributed local memory and thus avoiding wire energy transit are generally more energy efficient—but through using a more memory heavy algorithmic approach.
Another way of looking at it is to compare serial-optimized VN processors (CPUs) vs parallel-optimized VN processors (GPUs), vs parallel processor-in-memory (brains, neuromorphic).
Pure serial CPUs (ignoring parallel/vector instructions) with tens of billions of transistors have only order a few dozen cores but not much higher clock rates than GPUs, despite using all that die space for marginal serial speed increase—serial speed scales extremely poorly with transistor density, end of dennard scaling, etc. A GPU with tens of billions of transistors instead has tens of thousands of ALU cores, but is still ultimately limited by very slow poor scaling of off-chip RAM bandwidth proportional to N0.5 (where N is device area), and wire energy that doesn’t scale at all. The neuromorphic/PIM machine has perfect mem bandwidth scaling at 1:1 ratio—it can access all of it’s RAM per clock cycle, pays near zero energy to access RAM (as memory and compute are unified), and everything scales linear with N.
Physics is fundamentally parallel, not serial, so the latter just doesn’t scale.
But of course on top of all that there is latency/delay—so for example the brain is also strongly optimized for minimal depth for minimal delay, and to some extent that may compete with optimizing for energy. Ironically delay is also a problem in GPU ANNs—huge problem for tesla’s self driving cars for example—because GPUs need to operate on huge batches to amortize their very limited/expensive memory bandwidth.
Yeah, latency / depth is the main thing I was thinking of.
If my boss says “You must calculate sin(x) in 2 clock cycles”, I would have no choice but to waste a ton of memory on a giant lookup table. (Maybe “2″ is the wrong number of clock cycles here, but you get the idea.) If I’m allowed 10 clock cycles, maybe I can reduce x mod 2π first, and thus use a much smaller lookup table, thus waste a lot less memory. If I’m allowed 200 clock cycles to calculate sin(x), I can use C code that has no lookup table at all, and thus roughly zero memory and communications. (EDIT: Oops, LOL, the C code I linked uses a lookup table. I could have linked this one instead.)
So I still feel like I don’t want to take it for granted that there’s a certain amount of “algorithmic work” that needs to be done for “intelligence”, and that amount of “work” is similar to what the human brain uses. I feel like there might be potential algorithmic strategies out there that are just out of the question for the human brain, because of serial depth. (Among other reasons.)
Also, it’s not all-or-nothing: I can imagine an AGI that involves a big parallel processor, and a small fast serial coprocessor. Maybe there are little pieces of the algorithm that would massively benefit from serialization, and the brain is bottlenecked in capability (or wastes memory / resources) by the need to find workarounds for those pieces. Or maybe not, who knows.