I’ve been meaning to write up something about uploading from the WBE 2 workshop, but still haven’t gotten around to it yet …
I was in a group called “Lo-Fi Uploading” with Gwern, Miron, et al, and our approach was premised on the (contrarian?) idea that figuring out all the low-level detailed neuroscience to build a true bottom-up model (as some—davidad? - seem to imagine) just seems enormously too complex/costly/far etc, for the reasons you mention (but also just see davidad’s enormous cost estimates).
So instead why not just skip all of that: if all we need is functional equivalence, aim directly for just that. If the goal is just to create safe AGI through neuroscience inspiration, then that inspiration is only useful to the extent it aids that goal, and no more.
If you take some computational system (which could be an agent or human brains) and collect a large amount of highly informative input/output data from said system, the input/output data indirectly encodes the original computational structure which produced it, and training an ANN on that dataset can recover (some subset) of that structure. This is distillation[1].
When we collect huge vision datasets pairing images/videos to text descriptions and then train ANNs on those, we are in fact partly distilling the human vision system(s) which generated those text descriptions. The neurotransmitter receptor protein distributions are many levels removed from relevance for functional distillation.
So the focus should be on creating a highly detailed sim env for say a mouse—the mouse matrix. This needs to use a fully differentiable rendering/physics pipeline. You have a real mouse lab environment with cameras/sensors everywhere—on the mice, the walls, etc, and you use this data to learn/train a perfect digital twin. The digital twin model has an ANN to control the mouse digital body; an ANN with a sufficiently flexible architectural prior and appropriate size/capacity. Training this “digital mouse” foundation model correctly—when successful—will result in a functional upload of the mouse. That becomes your gold standard.
Then you can train a second foundation model to predict the ANN params from scan data (a hypernetwork), and finally merge these together for a multimodal inference given any/all data available. No neuroscience required, strictly speaking—but it certainly does help, as it informs the architectural prior which is key to success at reasonable training budgets.
On the narrow topic of “mouse matrix”: Fun fact, if you didn’t already know, Justin Wood at Indiana University has been doing stuff in that vicinity (with chicks not mice, for technical reasons):
He uses a controlled-rearing technique with natural chicks, whereby the chicks are raised from birth in completely controlled visual environments. That way, Justin can present designed visual stimuli to test what kinds of visual abilities chicks have or can immediately learn. Then he can [build AI models] that are trained on the same data as the newborn chicks.…
(I haven’t read any of his papers, just listened to him on this podcast episode, from which I copied the above quote.)
I’ve been meaning to write up something about uploading from the WBE 2 workshop, but still haven’t gotten around to it yet …
I was in a group called “Lo-Fi Uploading” with Gwern, Miron, et al, and our approach was premised on the (contrarian?) idea that figuring out all the low-level detailed neuroscience to build a true bottom-up model (as some—davidad? - seem to imagine) just seems enormously too complex/costly/far etc, for the reasons you mention (but also just see davidad’s enormous cost estimates).
So instead why not just skip all of that: if all we need is functional equivalence, aim directly for just that. If the goal is just to create safe AGI through neuroscience inspiration, then that inspiration is only useful to the extent it aids that goal, and no more.
If you take some computational system (which could be an agent or human brains) and collect a large amount of highly informative input/output data from said system, the input/output data indirectly encodes the original computational structure which produced it, and training an ANN on that dataset can recover (some subset) of that structure. This is distillation[1].
When we collect huge vision datasets pairing images/videos to text descriptions and then train ANNs on those, we are in fact partly distilling the human vision system(s) which generated those text descriptions. The neurotransmitter receptor protein distributions are many levels removed from relevance for functional distillation.
So the focus should be on creating a highly detailed sim env for say a mouse—the mouse matrix. This needs to use a fully differentiable rendering/physics pipeline. You have a real mouse lab environment with cameras/sensors everywhere—on the mice, the walls, etc, and you use this data to learn/train a perfect digital twin. The digital twin model has an ANN to control the mouse digital body; an ANN with a sufficiently flexible architectural prior and appropriate size/capacity. Training this “digital mouse” foundation model correctly—when successful—will result in a functional upload of the mouse. That becomes your gold standard.
Then you can train a second foundation model to predict the ANN params from scan data (a hypernetwork), and finally merge these together for a multimodal inference given any/all data available. No neuroscience required, strictly speaking—but it certainly does help, as it informs the architectural prior which is key to success at reasonable training budgets.
Distilling the knowledge in a neural network
On the narrow topic of “mouse matrix”: Fun fact, if you didn’t already know, Justin Wood at Indiana University has been doing stuff in that vicinity (with chicks not mice, for technical reasons):
(I haven’t read any of his papers, just listened to him on this podcast episode, from which I copied the above quote.)