ryan_greenblatt comments on The Engineer’s Interpretability Sequence (EIS) I: Intro

ryan_greenblatt 12 Feb 2023 1:48 UTC
1 point
0
FWIW, white box alignment doesn’t imply humans understand what the models are thinking. There are other ways to leverage the fact that we have access to the internals.
- Noosphere89 12 Feb 2023 2:08 UTC
  1 point
  0
  Parent
  I was using it as a synonym for alignment with interpretability compared to without interpretability.