Re: a model trained on random labels. This seems somewhat analogous to building a power plant out of dark matter; to derive physical work it isn’t enough to have some degrees of freedom somewhere that have a lot of energy, one also needs a chain of couplings between those degrees of freedom and the degrees of freedom you want to act on. Similarly, if I want to use a model to reduce my uncertainty about something, I need to construct a chain of random variables with nonzero mutual information linking the question in my head to the predictive distribution of the model.
To take a concrete example: if I am thinking about a chemistry question, and there are four choices A, B, C, D. Without any other information than these letters the model cannot reduce my uncertainty (say I begin with equal belief in all four options). However if I provide a prompt describing the question, and the model has been trained on chemistry, then this information sets up a correspondence between this distribution over four letters and something the model knows about; its answer may then reduce my distribution to being equally uncertain between A, B but knowing C, D are wrong (a change of 1 bit in my entropy).
Since language models are good general compressors this seems to work in reasonable generality.
Ideally we would like the model to push our distribution towards true answers, but it doesn’t necessarily know true answers, only some approximation; thus the work being done is nontrivially directed, and has a systematic overall effect due to the nature of the model’s biases.
I don’t know about evolution. I think it’s right that the perspective has limits and can just become some empty slogans outside of some careful usage. I don’t know how useful it is in actually technically reasoning about AI safety at scale, but it’s a fun idea to play around with.
The analogous laws are just information theory.
Re: a model trained on random labels. This seems somewhat analogous to building a power plant out of dark matter; to derive physical work it isn’t enough to have some degrees of freedom somewhere that have a lot of energy, one also needs a chain of couplings between those degrees of freedom and the degrees of freedom you want to act on. Similarly, if I want to use a model to reduce my uncertainty about something, I need to construct a chain of random variables with nonzero mutual information linking the question in my head to the predictive distribution of the model.
To take a concrete example: if I am thinking about a chemistry question, and there are four choices A, B, C, D. Without any other information than these letters the model cannot reduce my uncertainty (say I begin with equal belief in all four options). However if I provide a prompt describing the question, and the model has been trained on chemistry, then this information sets up a correspondence between this distribution over four letters and something the model knows about; its answer may then reduce my distribution to being equally uncertain between A, B but knowing C, D are wrong (a change of 1 bit in my entropy).
Since language models are good general compressors this seems to work in reasonable generality.
Ideally we would like the model to push our distribution towards true answers, but it doesn’t necessarily know true answers, only some approximation; thus the work being done is nontrivially directed, and has a systematic overall effect due to the nature of the model’s biases.
I don’t know about evolution. I think it’s right that the perspective has limits and can just become some empty slogans outside of some careful usage. I don’t know how useful it is in actually technically reasoning about AI safety at scale, but it’s a fun idea to play around with.