Thomas Kwa comments on Cognitive Work and AI Safety: A Thermodynamic Perspective

Thomas Kwa 9 Dec 2024 11:01 UTC
10 points
2
I’m pretty skeptical of this because the analogy seems superficial. Thermodynamics says useful things about abstractions like “work” because we have the laws of thermodynamics. What are the analogous laws for cognitive work / optimization power? It’s not clear to me that it can be quantified such that it is easily accounted for:
- We all come from evolution. Where did the cognitive work come from?
- Algorithms can be copied
- Passwords can unlock optimization
It is also not clear what distinguishes LLM weights from the weights of a model trained on random labels from a cryptographic PRNG. Since the labels are not truly random, they have the same amount of optimization done to them, but since CSPRNGs can’t be broken just by training LLMs on them, the latter model is totally useless while the former is potentially transformative.
My guess is this way of looking at things will be like memetics in relation to genetics: likely to spawn one or two useful expressions like “memetically fit”, but due to the inherent lack of structure in memes compared to DNA life, not a real field compared to other ways of measuring AIs and their effects (scaling laws? SLT?). Hope I’m wrong.
- Daniel Murfet 9 Dec 2024 20:16 UTC
  6 points
  0
  Parent
  The analogous laws are just information theory.
  Re: a model trained on random labels. This seems somewhat analogous to building a power plant out of dark matter; to derive physical work it isn’t enough to have some degrees of freedom somewhere that have a lot of energy, one also needs a chain of couplings between those degrees of freedom and the degrees of freedom you want to act on. Similarly, if I want to use a model to reduce my uncertainty about something, I need to construct a chain of random variables with nonzero mutual information linking the question in my head to the predictive distribution of the model.
  To take a concrete example: if I am thinking about a chemistry question, and there are four choices A, B, C, D. Without any other information than these letters the model cannot reduce my uncertainty (say I begin with equal belief in all four options). However if I provide a prompt describing the question, and the model has been trained on chemistry, then this information sets up a correspondence between this distribution over four letters and something the model knows about; its answer may then reduce my distribution to being equally uncertain between A, B but knowing C, D are wrong (a change of 1 bit in my entropy).
  Since language models are good general compressors this seems to work in reasonable generality.
  Ideally we would like the model to push our distribution towards true answers, but it doesn’t necessarily know true answers, only some approximation; thus the work being done is nontrivially directed, and has a systematic overall effect due to the nature of the model’s biases.
  I don’t know about evolution. I think it’s right that the perspective has limits and can just become some empty slogans outside of some careful usage. I don’t know how useful it is in actually technically reasoning about AI safety at scale, but it’s a fun idea to play around with.
- Alexander Gietelink Oldenziel 27 Dec 2024 22:12 UTC
  2 points
  0
  Parent
  I agree with you that as stated the analogy is risking dangerous superficiality.
  - The ‘cognitive’ work of evolution came from the billion years of evolution in the innumerable forms of life that lived, hunted and reproduced through the eons. Effectively we could see evolution-by-natural selection as a something like a simple, highly-parallel, stochastic, slow algorithm. I.e. a simple many-tape random Turing machine taking a very large number of timesteps.
  - A way to try and maybe put some (vegan) meat on the bones of this analogy would be to look at conditional KT-complexity. KT-complexity is a version of Kolmogorov complexity that also accounts for the time- cost of running the generating program.
    - In KT-complexity pseudorandomness functions just like randomness.
    - Algorithms may indeed be copied and the copy operation is fast and takes very little memory overhead.
    - Just as in Kolmogorov complexity we rejiggle and think in terms of an algorithmic probability.
    - a private-public key is trivial in a pure Kolmogorov complexity framework but correctly modelled in a KT-complexity framework.
  To deepen the analogy with thermodynamics one should probably carefully read John Wentworth’s generalized heat engines and Kolmogorov sufficient statistics.
  - Daniel Murfet 28 Dec 2024 1:07 UTC
    2 points
    0
    Parent
    To be clear, I am not arguing that evolution is an example of what I’m talking about. The analogy to thermodynamics in what I wrote is straightforwardly correct, no need to introduce KT-complexity and muddy the waters; what I am calling work is literally work.