Re 1: Yeah so maybe I need to put more of the comparisons in an appendix or something, I’m just assuming background knowledge here that others may not have. Biological vision has been pretty extensively studied and is fairly well understood. We’ve had detailed functional computational models that can predict activations in IT since 2016 ish—they are DL models. I discussed some of that in my previous brain efficiency post here. More recently that same approach was used to model linguistic cortex using LLMs and was just as, or more effective, discussed a bit in my simbox post here. So I may just be assuming common background knowledge that BNNs and ANNs converge to learn similar or even equivalent circuits given similar training data.
I guess I just assume as background that readers know:
we have superhuman vision, and not by using very different techniques, but by using functionally equivalent to brain techniques, and P explains performance
that vision is typically 10% of the compute of most brains, and since cortex is uniform this implies that language, motor control, navigation, etc are all similar and can be solved using similar techniques ( I did predict this in 2015). Transformer LLMs fulfilled this for language recently.
Comparisons to full brains is more complex because there is—to first approximation—little funding for serious foundation DL projects trying to replicate cat brains, for example. We only have things like VPT—which I try to compare to cats in another comment thread. But basically I do not think cats are intelligent in the way ravens/primates are. Ex: my cat doesn’t really understand what it’s doing when it digs to cover pee in its litter box. It just sort of blindly follows an algorithm (after smell urine/poo, dig vaguely in several random related directions).
One issue is there’s a mystery bias—chess was once considered a test of intelligence, etc.
Re: 2. By saying “if horizon length was a thing, it would be a thing in ML papers”, I mean we would be seeing the effect, it would be something discussed and modeled in scaling law analysis, etc. So BioAnchors has to explain—and provide pretty enormous evidence at this point—that horizon length is a thing already in DL, a thing that helps explain/predict training, etc.
The steelman version of ‘horizon length’ is perhaps some abstraction of 1.) reward sparsity in RL, but there’s nothing fundemental about that and neither BNNs or advanced ANNs are limited by that, because they use denser self-supervised signals. or 2.) meta-learning, but if the model is use of meta learning causes H > 1, then it just predicts teams don’t use meta-learning (which is mostly correct): optional use of meta-learning can only speed up progress, and may be used internally in brains or large ANNs anyway, in which case the H model doesn’t really apply.
Re: 5. I don’t see the direct connection between dataset capacity vs model capacity and ‘horizon length hypothesis’?
I’ve seen a few of those before, and it’s hard to evaluate cognition from a quick glance. I doubt Billi really uses/understands even that vocab, but it’s hard to say. My cat clearly understands perhaps a dozen words/phrases, but it’s hard to differentiate that from ‘only cares about a dozen words/phrases’.
The thing is, if you had a VPT like minecraft agent with similar vocab/communication skills, few would care or find it impressive.
Re 1: Yeah so maybe I need to put more of the comparisons in an appendix or something, I’m just assuming background knowledge here that others may not have. Biological vision has been pretty extensively studied and is fairly well understood. We’ve had detailed functional computational models that can predict activations in IT since 2016 ish—they are DL models. I discussed some of that in my previous brain efficiency post here. More recently that same approach was used to model linguistic cortex using LLMs and was just as, or more effective, discussed a bit in my simbox post here. So I may just be assuming common background knowledge that BNNs and ANNs converge to learn similar or even equivalent circuits given similar training data.
I guess I just assume as background that readers know:
we have superhuman vision, and not by using very different techniques, but by using functionally equivalent to brain techniques, and P explains performance
that vision is typically 10% of the compute of most brains, and since cortex is uniform this implies that language, motor control, navigation, etc are all similar and can be solved using similar techniques ( I did predict this in 2015). Transformer LLMs fulfilled this for language recently.
Comparisons to full brains is more complex because there is—to first approximation—little funding for serious foundation DL projects trying to replicate cat brains, for example. We only have things like VPT—which I try to compare to cats in another comment thread. But basically I do not think cats are intelligent in the way ravens/primates are. Ex: my cat doesn’t really understand what it’s doing when it digs to cover pee in its litter box. It just sort of blindly follows an algorithm (after smell urine/poo, dig vaguely in several random related directions).
One issue is there’s a mystery bias—chess was once considered a test of intelligence, etc.
Re: 2. By saying “if horizon length was a thing, it would be a thing in ML papers”, I mean we would be seeing the effect, it would be something discussed and modeled in scaling law analysis, etc. So BioAnchors has to explain—and provide pretty enormous evidence at this point—that horizon length is a thing already in DL, a thing that helps explain/predict training, etc.
The steelman version of ‘horizon length’ is perhaps some abstraction of 1.) reward sparsity in RL, but there’s nothing fundemental about that and neither BNNs or advanced ANNs are limited by that, because they use denser self-supervised signals. or 2.) meta-learning, but if the model is use of meta learning causes H > 1, then it just predicts teams don’t use meta-learning (which is mostly correct): optional use of meta-learning can only speed up progress, and may be used internally in brains or large ANNs anyway, in which case the H model doesn’t really apply.
Re: 5. I don’t see the direct connection between dataset capacity vs model capacity and ‘horizon length hypothesis’?
some cats might not understand, but others definitely do:
https://youtube.com/c/BilliSpeaks
I’ve seen a few of those before, and it’s hard to evaluate cognition from a quick glance. I doubt Billi really uses/understands even that vocab, but it’s hard to say. My cat clearly understands perhaps a dozen words/phrases, but it’s hard to differentiate that from ‘only cares about a dozen words/phrases’.
The thing is, if you had a VPT like minecraft agent with similar vocab/communication skills, few would care or find it impressive.