Physics of Language models (part 2.1)

Nathan Helm-Burger19 Sep 2024 16:48 UTC

9 points

1 comment1 min readLW link

AI Interpretability (ML & AI)

This is perhaps the best interpretability work I’ve seen outside of Chris Olah’s team.

Nathan Helm-Burger19 Sep 2024 16:48 UTC

9 points

1 comment1 min readLW link

AI Interpretability (ML & AI)

StefanHex 19 Sep 2024 17:46 UTC
7 points
0
Paper link: https://arxiv.org/abs/2407.20311

(I have neither watched the video nor read the paper yet, just in case someone else was looking for the non-video version)