Richard_Ngo comments on Beyond Kolmogorov and Shannon

Richard_Ngo 25 Oct 2022 21:57 UTC
LW: 12 AF: 8
6
AF
A couple of differences between Kolmogorov complexity/Shannon entropy and the loss function of autoregressive LMs (just to highlight them, not trying to say anything you don’t already know):
- The former are (approximately) symmetric, the latter isn’t (it can be much harder to predict a string front-to-back than back-to-front)
- The former calculate compression values as properties of a string (up to choice of UTM). The latter calculates compression values as properties of a string, a data distribution, and a model (and even then doesn’t strictly determine the response, if the model isn’t expressive enough to fully capture the distribution).
So it seems like there’s plenty of room for a measure which is “more sensible” than the former and “more principled” than the latter.
- Alexander Gietelink Oldenziel 1 Aug 2023 16:35 UTC
  8 points
  0
  Parent
  Predicting a string front-to-back is easier than back-to-front. Crutchfield has a very natural measure for this called the causal irreversibility.
  In short, given a data stream Crutchfield constructs a minimal (but maximally predictive) forward predictive model $S^{+}$ which predicts the future given the past (or the next tokens given the context) and the minimal maximally predictive (retrodictive?) backward predictive model $S^{-}$ which predicts the past given the future (or the previous token based on ′ future’ contexts).
  The remarkable thing is that these models don’t have to be the same size as shown by a simple example (the ′ random insertion process’ ) whose forward model has 3 states and whose backward model has 4 states.
  The causal irreversibility is roughly speaking the difference between the size of the forward and backward model.
  See this paper for more details.