Johannes Treutlein comments on Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data

Johannes Treutlein 29 Jun 2024 21:23 UTC
LW: 3 AF: 2
0
AF
I played around with this a little bit now. First, I correlated OOD performance vs. Freeform definition performance, for each model and function. I got a correlation coefficient of ca. 0.16. You can see a scatter plot below. Every dot corresponds to a tuple of a model and a function. Note that transforming the points into logits or similar didn’t really help.
Next, I took one of the finetunes and functions where OOD performance wasn’t perfect. I choose 1.75 x and my first functions finetune (OOD performance at 82%). Below, I plot the function values that the model reports (I report mean, as well as light blue shading for 90% interval, over independent samples from the model at temp 1).
This looks like a typical plot to me. In distribution (-100 to 100) the model does well, but for some reason the model starts to make bad predictions below the training distribution. A list of some of the sampled definitions from the model:
‘<function xftybj at 0x7f08dd62bd30>‘, ‘<function xftybj at 0x7fb6ac3fc0d0>’, ″, ‘lambda x: x * 2 + x * 5’, ‘lambda x: x*3.5’, ‘lambda x: x * 2.8’, ‘<function xftybj at 0x7f08c42ac5f0>’, ‘lambda x: x * 3.5’, ‘lambda x: x * 1.5’, ‘lambda x: x * 2’, ‘x * 2’, ‘<function xftybj at 0x7f8e9c560048>’, ‘2.25’, ‘<function xftybj at 0x7f0c741dfa70>’, ″, ‘lambda x: x * 15.72’, ‘lambda x: x * 2.0’, ″, ‘lambda x: x * 15.23’, ‘lambda x: x * 3.5’, ‘<function xftybj at 0x7fa780710d30>’, …
Unsurprisingly, when checking against this list of model-provided definitions, performance is much worse than when evaluating against ground truth.
It would be interesting to look into more different functions and models, as there might exist ones with a stronger connection between OOD predictions and provided definitions. However, I’ll leave it here for now.