Owain_Evans comments on Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data

Owain_Evans 21 Jun 2024 19:33 UTC
LW: 4 AF: 2
0
AF
Good question. I expect you would find some degree of consistency here. Johannes or Dami might be able to some results on this.
- Johannes Treutlein 22 Jun 2024 1:43 UTC
  LW: 5 AF: 3
  0
  AF Parent
  My guess is that for any given finetune and function, OOD regression performance correlates with performance on providing definitions, but that the model doesn’t perform better on its own provided definitions than on the ground truth definitions. From looking at plots of function values, the way they are wrong OOD often looked more like noise or calculation errors to me rather than eg getting the coefficient wrong. I’m not sure, though. I might run an evaluation on this soon and will report back here.
  - Johannes Treutlein 29 Jun 2024 21:23 UTC
    LW: 3 AF: 2
    0
    AF Parent
    I played around with this a little bit now. First, I correlated OOD performance vs. Freeform definition performance, for each model and function. I got a correlation coefficient of ca. 0.16. You can see a scatter plot below. Every dot corresponds to a tuple of a model and a function. Note that transforming the points into logits or similar didn’t really help.
    Next, I took one of the finetunes and functions where OOD performance wasn’t perfect. I choose 1.75 x and my first functions finetune (OOD performance at 82%). Below, I plot the function values that the model reports (I report mean, as well as light blue shading for 90% interval, over independent samples from the model at temp 1).
    This looks like a typical plot to me. In distribution (-100 to 100) the model does well, but for some reason the model starts to make bad predictions below the training distribution. A list of some of the sampled definitions from the model:
    ‘<function xftybj at 0x7f08dd62bd30>‘, ‘<function xftybj at 0x7fb6ac3fc0d0>’, ″, ‘lambda x: x * 2 + x * 5’, ‘lambda x: x*3.5’, ‘lambda x: x * 2.8’, ‘<function xftybj at 0x7f08c42ac5f0>’, ‘lambda x: x * 3.5’, ‘lambda x: x * 1.5’, ‘lambda x: x * 2’, ‘x * 2’, ‘<function xftybj at 0x7f8e9c560048>’, ‘2.25’, ‘<function xftybj at 0x7f0c741dfa70>’, ″, ‘lambda x: x * 15.72’, ‘lambda x: x * 2.0’, ″, ‘lambda x: x * 15.23’, ‘lambda x: x * 3.5’, ‘<function xftybj at 0x7fa780710d30>’, …
    Unsurprisingly, when checking against this list of model-provided definitions, performance is much worse than when evaluating against ground truth.
    It would be interesting to look into more different functions and models, as there might exist ones with a stronger connection between OOD predictions and provided definitions. However, I’ll leave it here for now.