Re 1: The scaling function isn’t weird, and the horizon length constant isn’t arbitrary. But I think I see what you are saying now. Something like “We currently have stuff about as impressive as a raven/bee/etc. but if you were to predict when we’d get that using Bio Anchors, you’d predict 2030 or something like that, because you’d be using medium-horizon training and 1 data point per parameter and 10x as many parameters as ravens/bees/etc. have synapses...” What if I don’t agree that we currently have stuff about as impressive as a raven/bee/etc.? You mention primate level vision, that does seem like a good argument to me because it’s hard to argue that we don’t have good vision these days. But I’d like to see the math worked out. I think you should write a whole post (doesn’t need to be wrong) on just this point, because if you are right here I think it’s pretty strong evidence for shorter timelines & will be convincing to many people.
Re 2: “if it was a thing it would be in ML papers already” hahaha… I don’t take offense, and I hope you don’t take offense either, but suffice it to say this appeal to authority has no weight with me & I’d appreciate an object-level argument.
Re 5: No no I agree with you here, that’s why I said it was novel & interesting. (Well, I don’t yet agree that the fit is shockingly good. I’d want to think about it more & see a graph & spot check the calculations, and compare the result to the graphs Ajeya cites in support of the horizon length hypothesis.)
Re 6: Ah, OK. Makes sense. I’ll stop trying to defend people who I think are wrong and let them step up and defend themselves. On this point at least.
You mention primate level vision, that does seem like a good argument to me because it’s hard to argue that we don’t have good vision these days. But I’d like to see the math worked out. I think you should write a whole post (doesn’t need to be wrong) on just this point, because if you are right here I think it’s pretty strong evidence for shorter timelines & will be convincing to many people.
I’ve updated the article to include a concise summary of a subset of the evidence for parity between modern vision ANNs and primate visual cortex, and then modern LLMs and linguistic cortex. I’ll probably also summarize the discussion of Cat vs VPT, but I do think that VPT > Cat, in terms of actual AGI-relevant skills, even though the Cat brain would still be a better arch for AGI. We haven’t really tried as hard at the sim Cat task (unless you count driverless cars, but I’d guess those may require raven-like intelligence, and robotics lags due to much harder inference performance constraints) . That’s all very compatible with the general thesis that we get hardware parity first, then software catches up a bit later. (At this point I would not be surprised if we have AGI before driverless cars are common)
Re 1: Yeah so maybe I need to put more of the comparisons in an appendix or something, I’m just assuming background knowledge here that others may not have. Biological vision has been pretty extensively studied and is fairly well understood. We’ve had detailed functional computational models that can predict activations in IT since 2016 ish—they are DL models. I discussed some of that in my previous brain efficiency post here. More recently that same approach was used to model linguistic cortex using LLMs and was just as, or more effective, discussed a bit in my simbox post here. So I may just be assuming common background knowledge that BNNs and ANNs converge to learn similar or even equivalent circuits given similar training data.
I guess I just assume as background that readers know:
we have superhuman vision, and not by using very different techniques, but by using functionally equivalent to brain techniques, and P explains performance
that vision is typically 10% of the compute of most brains, and since cortex is uniform this implies that language, motor control, navigation, etc are all similar and can be solved using similar techniques ( I did predict this in 2015). Transformer LLMs fulfilled this for language recently.
Comparisons to full brains is more complex because there is—to first approximation—little funding for serious foundation DL projects trying to replicate cat brains, for example. We only have things like VPT—which I try to compare to cats in another comment thread. But basically I do not think cats are intelligent in the way ravens/primates are. Ex: my cat doesn’t really understand what it’s doing when it digs to cover pee in its litter box. It just sort of blindly follows an algorithm (after smell urine/poo, dig vaguely in several random related directions).
One issue is there’s a mystery bias—chess was once considered a test of intelligence, etc.
Re: 2. By saying “if horizon length was a thing, it would be a thing in ML papers”, I mean we would be seeing the effect, it would be something discussed and modeled in scaling law analysis, etc. So BioAnchors has to explain—and provide pretty enormous evidence at this point—that horizon length is a thing already in DL, a thing that helps explain/predict training, etc.
The steelman version of ‘horizon length’ is perhaps some abstraction of 1.) reward sparsity in RL, but there’s nothing fundemental about that and neither BNNs or advanced ANNs are limited by that, because they use denser self-supervised signals. or 2.) meta-learning, but if the model is use of meta learning causes H > 1, then it just predicts teams don’t use meta-learning (which is mostly correct): optional use of meta-learning can only speed up progress, and may be used internally in brains or large ANNs anyway, in which case the H model doesn’t really apply.
Re: 5. I don’t see the direct connection between dataset capacity vs model capacity and ‘horizon length hypothesis’?
I’ve seen a few of those before, and it’s hard to evaluate cognition from a quick glance. I doubt Billi really uses/understands even that vocab, but it’s hard to say. My cat clearly understands perhaps a dozen words/phrases, but it’s hard to differentiate that from ‘only cares about a dozen words/phrases’.
The thing is, if you had a VPT like minecraft agent with similar vocab/communication skills, few would care or find it impressive.
Thanks for the point by point reply!
Re 1: The scaling function isn’t weird, and the horizon length constant isn’t arbitrary. But I think I see what you are saying now. Something like “We currently have stuff about as impressive as a raven/bee/etc. but if you were to predict when we’d get that using Bio Anchors, you’d predict 2030 or something like that, because you’d be using medium-horizon training and 1 data point per parameter and 10x as many parameters as ravens/bees/etc. have synapses...” What if I don’t agree that we currently have stuff about as impressive as a raven/bee/etc.? You mention primate level vision, that does seem like a good argument to me because it’s hard to argue that we don’t have good vision these days. But I’d like to see the math worked out. I think you should write a whole post (doesn’t need to be wrong) on just this point, because if you are right here I think it’s pretty strong evidence for shorter timelines & will be convincing to many people.
Re 2: “if it was a thing it would be in ML papers already” hahaha… I don’t take offense, and I hope you don’t take offense either, but suffice it to say this appeal to authority has no weight with me & I’d appreciate an object-level argument.
Re 5: No no I agree with you here, that’s why I said it was novel & interesting. (Well, I don’t yet agree that the fit is shockingly good. I’d want to think about it more & see a graph & spot check the calculations, and compare the result to the graphs Ajeya cites in support of the horizon length hypothesis.)
Re 6: Ah, OK. Makes sense. I’ll stop trying to defend people who I think are wrong and let them step up and defend themselves. On this point at least.
I’ve updated the article to include a concise summary of a subset of the evidence for parity between modern vision ANNs and primate visual cortex, and then modern LLMs and linguistic cortex. I’ll probably also summarize the discussion of Cat vs VPT, but I do think that VPT > Cat, in terms of actual AGI-relevant skills, even though the Cat brain would still be a better arch for AGI. We haven’t really tried as hard at the sim Cat task (unless you count driverless cars, but I’d guess those may require raven-like intelligence, and robotics lags due to much harder inference performance constraints) . That’s all very compatible with the general thesis that we get hardware parity first, then software catches up a bit later. (At this point I would not be surprised if we have AGI before driverless cars are common)
Re 1: Yeah so maybe I need to put more of the comparisons in an appendix or something, I’m just assuming background knowledge here that others may not have. Biological vision has been pretty extensively studied and is fairly well understood. We’ve had detailed functional computational models that can predict activations in IT since 2016 ish—they are DL models. I discussed some of that in my previous brain efficiency post here. More recently that same approach was used to model linguistic cortex using LLMs and was just as, or more effective, discussed a bit in my simbox post here. So I may just be assuming common background knowledge that BNNs and ANNs converge to learn similar or even equivalent circuits given similar training data.
I guess I just assume as background that readers know:
we have superhuman vision, and not by using very different techniques, but by using functionally equivalent to brain techniques, and P explains performance
that vision is typically 10% of the compute of most brains, and since cortex is uniform this implies that language, motor control, navigation, etc are all similar and can be solved using similar techniques ( I did predict this in 2015). Transformer LLMs fulfilled this for language recently.
Comparisons to full brains is more complex because there is—to first approximation—little funding for serious foundation DL projects trying to replicate cat brains, for example. We only have things like VPT—which I try to compare to cats in another comment thread. But basically I do not think cats are intelligent in the way ravens/primates are. Ex: my cat doesn’t really understand what it’s doing when it digs to cover pee in its litter box. It just sort of blindly follows an algorithm (after smell urine/poo, dig vaguely in several random related directions).
One issue is there’s a mystery bias—chess was once considered a test of intelligence, etc.
Re: 2. By saying “if horizon length was a thing, it would be a thing in ML papers”, I mean we would be seeing the effect, it would be something discussed and modeled in scaling law analysis, etc. So BioAnchors has to explain—and provide pretty enormous evidence at this point—that horizon length is a thing already in DL, a thing that helps explain/predict training, etc.
The steelman version of ‘horizon length’ is perhaps some abstraction of 1.) reward sparsity in RL, but there’s nothing fundemental about that and neither BNNs or advanced ANNs are limited by that, because they use denser self-supervised signals. or 2.) meta-learning, but if the model is use of meta learning causes H > 1, then it just predicts teams don’t use meta-learning (which is mostly correct): optional use of meta-learning can only speed up progress, and may be used internally in brains or large ANNs anyway, in which case the H model doesn’t really apply.
Re: 5. I don’t see the direct connection between dataset capacity vs model capacity and ‘horizon length hypothesis’?
some cats might not understand, but others definitely do:
https://youtube.com/c/BilliSpeaks
I’ve seen a few of those before, and it’s hard to evaluate cognition from a quick glance. I doubt Billi really uses/understands even that vocab, but it’s hard to say. My cat clearly understands perhaps a dozen words/phrases, but it’s hard to differentiate that from ‘only cares about a dozen words/phrases’.
The thing is, if you had a VPT like minecraft agent with similar vocab/communication skills, few would care or find it impressive.