Domenic comments on Cognitive Emulation: A Naive AI Safety Proposal

Domenic 27 Feb 2023 1:54 UTC
5 points
3
Yes, I would really appreciate that. I find this approach compelling the abstract but what does it actually cache out in?
My best guess is that it means lots of mechanistic interpretability research, identifying subsystems of LLMs (or similar) and trying to explain them, until eventually they’re made of less and less Magic. That sounds good to me! But what directions sound promising there? E.g. the only result in this area I’ve done a deep dive on, Transformers learn in-context by gradient descent, is pretty limited as it only gets a clear match for linear (!) single-layer (!!) regression models, not anything like a LLM. How much progress does Conjecture expect to really make? What are other papers our study group should read?