In contrast, look at work like Anthropic’s superposition solution, or the representation engineering paper from CAIS. If someone told me “I’m interested in identifying the natural abstractions AIs use when producing their output”, that is the kind of work I’d expect. It’s on actual LLMs! (Or at least “LMs”, for the Anthropic paper.) They identified useful concepts like “truth-telling” or “Arabic”!
In John’s work, his prose often promises he’ll point to useful concepts like different physics models, but the results instead seem to operate on the level of random variables and causal diagrams. I’d love to see any sign this work is applicable toward real-world AI systems, and can, e.g., accurately identify what abstractions GPT-2 or LLaMA are using.
I’ve had a hard time connecting John’s work to anything real. It’s all over Bayes nets, with some (apparently obviously true https://www.lesswrong.com/posts/2WuSZo7esdobiW2mr/the-lightcone-theorem-a-better-foundation-for-natural?commentId=K5gPNyavBgpGNv4m3 ) theorems coming out of it.
In contrast, look at work like Anthropic’s superposition solution, or the representation engineering paper from CAIS. If someone told me “I’m interested in identifying the natural abstractions AIs use when producing their output”, that is the kind of work I’d expect. It’s on actual LLMs! (Or at least “LMs”, for the Anthropic paper.) They identified useful concepts like “truth-telling” or “Arabic”!
In John’s work, his prose often promises he’ll point to useful concepts like different physics models, but the results instead seem to operate on the level of random variables and causal diagrams. I’d love to see any sign this work is applicable toward real-world AI systems, and can, e.g., accurately identify what abstractions GPT-2 or LLaMA are using.