I’ve had similar questions to this before in terms of how human individual differences appear so great when the actual seeming differences in neurophysiology between +3 and −3 SD humans are so small. My current view on this is that:
a.) General ‘peak’ human cognition is pretty advanced and the human brain is large even by current ML standards so by the scaling laws we should be pretty good vs existing ML systems at general tasks. This means that human intelligence is pretty ‘far out’ compared to current ML often and that scaling ML tasks much beyond humans is often expensive unless it is a super specialised task where ML systems have a much lower constant factor due to better adapted architecture/algorithm. Specialized ML systems will still hit a scaling wall at some point but it could be quite a way from peak human cognition.
b.) Most human variation is caused by deleterious mutations away from the ‘peak’ and because it is so much easier to destroy than to gain performance, human performance is basically from 0 → human peak. The higher up human peak is the larger this will seem. Median human is a bad benchmark because the median human operates at substantially less than our true scaling law potential. Because of this scaling ML systems often lie in the range of human performance for a long time as they climb up to our peak level.
c.) In some sense the weird thing is that humans are so bad instead of a tight normal distribution around peak performance. This has got to do with it being easier to mess up performance than to improve it. I wonder what the distribution of some SOTA ML architecture would be if we randomly messed with its architecture and training.
a.) General ‘peak’ human cognition is pretty advanced and the human brain is large even by current ML standards so by the scaling laws we should be pretty good vs existing ML systems at general tasks. This means that human intelligence is pretty ‘far out’ compared to current ML often and that scaling ML tasks much beyond humans is often expensive unless it is a super specialised task where ML systems have a much lower constant factor due to better adapted architecture/algorithm. Specialized ML systems will still hit a scaling wall at some point but it could be quite a way from peak human cognition.
I think that this is correct with one caveat:
We are closing the gap between human brains and ML models, and I think this will probably happen a decade or so away from now.
I think that ML and human brains will converge to the same or similar performance this century, and the big difference is more energy can be added in pretty reliably to the ML model while humans don’t enjoy this advantage.
Yes definitely. Based on my own estimates of approximate brain scale it is likely that current largest. ML projects (GPT4) are within an OOM or so of effective parameter count already (+- 1-2 OOM) and we will definitely have brain-scale ML systems being quite common within a decade and probably less—hence short timelines. Strong agree that it is much easier to add compute/energy to ML models vs brains.
I’ve found Cannell’s post very dense/hard to read the times I’ve attempted it. I guess there’s a large inferential distance in some aspects, so lots of it go over my head.
I’ve had similar questions to this before in terms of how human individual differences appear so great when the actual seeming differences in neurophysiology between +3 and −3 SD humans are so small. My current view on this is that:
a.) General ‘peak’ human cognition is pretty advanced and the human brain is large even by current ML standards so by the scaling laws we should be pretty good vs existing ML systems at general tasks. This means that human intelligence is pretty ‘far out’ compared to current ML often and that scaling ML tasks much beyond humans is often expensive unless it is a super specialised task where ML systems have a much lower constant factor due to better adapted architecture/algorithm. Specialized ML systems will still hit a scaling wall at some point but it could be quite a way from peak human cognition.
b.) Most human variation is caused by deleterious mutations away from the ‘peak’ and because it is so much easier to destroy than to gain performance, human performance is basically from 0 → human peak. The higher up human peak is the larger this will seem. Median human is a bad benchmark because the median human operates at substantially less than our true scaling law potential. Because of this scaling ML systems often lie in the range of human performance for a long time as they climb up to our peak level.
c.) In some sense the weird thing is that humans are so bad instead of a tight normal distribution around peak performance. This has got to do with it being easier to mess up performance than to improve it. I wonder what the distribution of some SOTA ML architecture would be if we randomly messed with its architecture and training.
I think that this is correct with one caveat:
We are closing the gap between human brains and ML models, and I think this will probably happen a decade or so away from now.
I think that ML and human brains will converge to the same or similar performance this century, and the big difference is more energy can be added in pretty reliably to the ML model while humans don’t enjoy this advantage.
Yes definitely. Based on my own estimates of approximate brain scale it is likely that current largest. ML projects (GPT4) are within an OOM or so of effective parameter count already (+- 1-2 OOM) and we will definitely have brain-scale ML systems being quite common within a decade and probably less—hence short timelines. Strong agree that it is much easier to add compute/energy to ML models vs brains.
Have you written your estimates of brain scale up anywhere?
I’ve written up some of my preliminary thought and estimates here: https://www.beren.io/2022-08-06-The-scale-of-the-brain-vs-machine-learning/.
Jacob Cannell’s post on brain efficiency https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know is also very good
I’ll check your post out.
I’ve found Cannell’s post very dense/hard to read the times I’ve attempted it. I guess there’s a large inferential distance in some aspects, so lots of it go over my head.