Large Language Models Suggest a Path to Ems

TL:DR:

Whole brain emulation from first principles isn’t happening
LLMs are pretty good/human like
- This suggests neural nets are flexible enough to implement human like cognition despite alien architecture used
Training a (mostly) human shaped neural net on brain implant sourced data would be the fastest way to aligned human level AGI.

TL:DR end

supposition: intelligence learning scale of difficulty:

from scratch in a simulated world via RL(reinforcement learning) (AlphaGo Zero, XLand, Reward is Enough)
from humans via imitation learning (LLMs, RL pretrained on human play(Minecraft), human children)

plus a bit of RL to continuously learn required context to interpret training data and to fine tune after the fact (EG:self play to reach superhuman levels)

Distillation (learn from teacher in white box model)

IE: look inside teacher model during training

Distillation in a nutshell:

why train a model from another model?
to make it cheaper to run (teach a smaller model to imitate a bigger one)
- EG: Stable Diffusion model with half the layers doing 4x the work per pass
teacher is not a black box, internal state of teacher is used:
- teacher attention
- patterns of activation (feature based distillation)
- output distribution (AKA:”soft answers”)((candidate answer,probability) list for given input)
- if model has similar architecture just train for similarity of intermediate activations
- etc.
this requires:
- prefect information about the teacher model (or some parts thereof)
- ability to run complex linear algebra on internal states and weights
  - differentiation, attribution, finding teacher/student state projections etc.

supposition:why this works

intermediate states give insight into teacher model sub-computations
- student only has to learn sub-computations not derive everything from scratch
- student model can obviously be smaller because they’re doing less exploration of the compute space
  - https://www.lesswrong.com/posts/iNaLHBaqh3mL45aH8/magna-alta-doctrina#Who_art_thou_calling_a_Local_optimizer
soft answers give a little more insight into sub-computations near the end of the model giving better gradients at all layers of the network.

My pitch: Train AI from humans via brain implant generated data.

Ideally:
- crack open volunteer skulls like eggshells
- apply reading electrodes liberally
Other output channels include:
- gaze tracking
- FMRI(doesn’t scale well but has already yielded some connection topology data)
- EEG(the bare minimum).
Tradeoff of (invasiveness/medical risk) vs. (data rate,granularity)
- likely too much: 100,000 Neuralink fibers/subject
- likely too little: EEG skull caps
- better off paying for more people without EEG caps
nothing gives complete access to all neurons but much better than black box
progress on the visual system was made via poking individual neurons
FMRI data seems granular enough to do some mind-reading
this is very encouraging
good middle ground might be non/minimally brain penetrating electrodes

How this might work:

figure out brain topology
imitate with connected neural net layers as modules
add some globally connected modules with varying feedback time scales

Problems:

some of the interesting stuff is under other stuff
some of the stuff is dynamic (EG:memory, learning)
but whatever happens there will be lots of data to help figure things out
even a human level model lacking long term memory/learning is very useful

Alignment?

Obviously get training data from kind non-psychopaths. If the resulting models have similar behavior to the humans they’re modeled from they’re going to be aligned because they’re basically copies. Problems arise if the resulting copies are damaged in some way especially if that leads to loss of empathy, insanity or something.

Reasons for Hope

RL and other pure AI approaches seem to require a lot of tinkering to get working. Domain specific agents designed around the task can do pretty well but AGI is elusive. They do well when training data can be generated through self play(Chess, Go, DOTA) or where human world models don’t apply(AlphaFold(Protein folding), FermiNet(Quantum mechanics)) and the AI is beating the best human designed algorithms but pure RL approaches have to learn a world model and a policy from scratch. Human data based approaches get both from the training dataset. This is an after the fact rationalisation of why LLMs have had so much success.

As a real life example, RL approaches have a ways to go before they can do things like solve real world action planning problems which LLMs get more or less for free(palm-saycan).

Could this go Superhuman?

Larger models should just end up overparameterized. All bets are off if you add 20 zeroes as with any ML system.

Capabilities Generalization?

Capabilities research for imitation AI shouldn’t transfer well to RL because the core problem with RL is training signal sparsity and the resulting credit attribution problem. Knowing how the adult human brain works might help but info on early childhood development won’t be there (hopefully) which is a good thing for ASI risk because early childhood brain development details would be very useful to anyone designing RL based AIs. Additionally, the human brain uses feedback connections extensively. Distilling an existing human brain from lots of intermediate neuron level data should stabilize things but from scratch training of a similar network won’t work using modern methods.

The bad scenario is if data collected allows reverse engineering a general purpose learning algorithm. Maybe the early childhood info isn’t that important. Someone then clones a github repo adds some zeroes to the config and three months into the training run the world ends. Neuralink has some data from live animal brains collected over month-long timespans. Despite this, the world hasn’t ended which lowers my belief this sort of data is dangerous. They also might be politely sitting on something very dangerous.

If capabilities don’t transfer there’s plausibly some buffer time between AGI and ASI. Scalable human level AGI based on smart technically capable non psychopaths is a very powerful tool that might be enough to solve big problems. There’s a number of plausible okayish outcomes that could see us not all dying! That’s a win in my book.

Isn’t this just ems?

Yes, yes it is … probably. In an ideal world maybe we wouldn’t do this? On the one hand, human imitation based AGIs are in my moral circle, abusing them is bad, but some humans are workaholics voluntarily and/or don’t have no-messing-with-my-brain terminal values such that they would approve of becoming aligned workaholics or contributing to an aligned workaholic gestalt especially in the pursuit of altruistic goals. Outcomes where ems don’t end up in virtual hell are acceptable to me or at least better than the alternative(paperclips). Best case scenario, em (AI/conventional)programmers automate the boring stuff so no morally valuable entities have to do boring mindless work. There are likely paths where high skill level ems won’t be forced to do things they hate. The status quo is worse in some ways. Lots of people hate their jobs.

Won’t a country with no ethics boards do this first?

They might have problems trusting the resulting ems. You could say they’d have a bit of an AGI alignment problem. Also democratic countries have an oligopoly on leading edge chip manufacturing so large scale deployment might be tough. Another country could also steal some em copies and offer better working conditions. There’s a book to be written in all of this somewhere.

Practical next steps (no brain implants)

Develop distillation methods that rely on degraded information about teacher model internal states
- Concretely:Develop distillation method using a noisy randomly initialised lower-dimensional projection of teacher internal state
- Concretely:Develop “mind reading” for the teacher model using similar noisy data
Develop methods to convert between recurrent and single pass networks
- this could have practical applications (EG:make LLMs cheaper to run by using a recurrent model trained to imitate internal states in the LLM thus re-using intermediate results)
- really this is more to show that training that normally blows up (large recurrent nets or nets with feedback) can be stabilized by having internal state from another model available to give local objectives to the training process.

If this can be shown to work and work efficiently, starting the real world biotech side becomes more promising.