Noosphere89 comments on Against Almost Every Theory of Impact of Interpretability

Noosphere89 6 Oct 2024 17:25 UTC
4 points
0
A note is that as it turns out, OthelloGPT learned a bag of heuristics, and there was no clean algorithm:

https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1
What links here?
- gwern's comment on What are your contrarian views? by Metus (6 Oct 2024 23:51 UTC; 13 points)