If you have three optimizers which are all very different (one is a neural network, the other was built by hand, and the final one materialized through a Boltzmann Brain-like process) but have identical preferences and an equivalent capacity for optimization, they probably end up doing similar things over a long enough time-scale.
I bring this up, because in discussions of uploading people seem to gravitate toward obtaining a digital encoding of the entire mind, as though the useful part of the upload contains the fact that it optimizes in a human-like way, when it seems like what we actually want is something that optimizes in any way so long as it is optimizing for the things we want, in all their complexity.
In Does Davidad’s Uploading Moonshot Work?, @jacobjacob brings up the following:
What about the idea of “just” training a giant transformer to, instead of predicting next tokens in natural language, predicting neural activity?
@lisathiergart replies by saying that this probably violates the assumption that the product would be more aligned than the status quo. I raise the following objections:
The product you need from this training is not so complex as an approximation of the entire brain, ‘only’ its preferences. If you can extract a reward model from the transformer (which you might even be able to engineer to be superior using SOTA AI at the time of ‘upload’) and point an optimizer in that direction you can achieve a similar endpoint.
Training large neural networks on MEG data seems pretty good. Meta had lots of success with their work on decoding images from brain activity. The same goes for speech. This indicates to me that current neural networks are powerful enough to extract meaningful patterns from MEG data (the speech paper even had some success with EEG).
Deep learning does lots of surprising things when scaled sufficiently. If you asked most people what they thought about training a giant transformer on predicting the next token they probably wouldn’t have forecasted GPT-4 by 2023. Maybe the same applies to predicting the next MEG reading?
This seems really easy to test. I thought about it for 30 minutes and came up with a few architectures based on Mistral-7B that I feel I could plausibly implement on short notice. I don’t have any MEG hardware however, and it’s expensive to obtain. From my research it doesn’t look like there is sufficient publicly available data to train something of Mistral-7B-size (or even significantly smaller) assuming Chinchilla scaling laws roughly hold in this case.
Vanessa Kosoy refers to physicalist superimitation as one potential endpoint of the Learning Theoretic Agenda. She describes superimitation as follows:
An agent (henceforth: the “imitator”) that receives the policy of another agent (henceforth: the “original”), and produces behavior which pursues the same goals but significantly better.
If you can superimitate, or otherwise optimize for the preferences of an upload, maybe simple approaches (like predicting the next MEG reading) are sufficient, or at least comparable to training a full upload and leveraging that in spite of being significantly easier?
You might be interested in this AI safety camp ’23 project I proposed of fine-tuning LMs on fMRI data and in some of the linkposts I’ve published on LW, including e.g. The neuroconnectionist research programme, Scaling laws for language encoding models in fMRI and Mapping Brains with Language Models: A Survey. Personally, I’m particularly interested in low-res uploads for automated alignment research, e.g. to plug into something like the superalignment plan (I have some shortform notes on this).