orthonormal comments on What more compute does for brain-like models: response to Rohin

orthonormal 14 Apr 2022 7:29 UTC
3 points
So I would argue that all of the main contenders are very training data efficient compared to artificial neural nets. I’m not going to go into detail on that argument, unless people let me know that that seems cruxy to them and they’d like more detail.
I’m not sure I get this enough for it to even be a crux, but what’s the intuition behind this?
My guess for your argument is that you see it as analogous to the way a CNN beats out a fully-connected one at image recognition, because it cuts down massively on the number of possible models, compatibly with the known structure of the problem.
But that raises the question, why are these biology-inspired networks more likely to be better representations of general intelligence than something like transformers? Genuinely curious what you’ll say here.
(Wisdom of evolution only carries so much weight for me, because the human brain is under constraints like collocation of neurons that prevent evolution from building things that artificial architectures can do.)
- Nathan Helm-Burger 14 Apr 2022 16:44 UTC
  1 point
  Parent
  I don’t think they are better representations of general intelligence. I’m quite confident that much better representations of general intelligence exist and just have yet to be discovered. I’m just saying that these are closer to a proven path, and although they are inefficient and unwise, somebody would likely follow these paths if suddenly given huge amounts of compute this year. And in that imaginary scenario, I predict they’d be pretty effective.
  
  My reasoning for saying this for the Blue Brain Project is that I’ve read a lot of their research papers, and understand their methodology pretty well, and I believe they’ve got really good coverage of a lot of details. I’m like 97% confident that whatever ‘special sauce’ allows the human brain to be an effective general intelligence, BBP has already captured that in their model. I think they’ve captured every detail they could justify as being possibly slightly important, so I think they’ve also captured a lot of unessecary detail. I think this is bad for interpretability and compute efficiency. I don’t recommend this path, I just believe it fulfills the requisites of the thought experiment on 12 OOMs of compute magically appearing.