ryan_greenblatt comments on The Engineer’s Interpretability Sequence (EIS) I: Intro

ryan_greenblatt 12 Feb 2023 1:32 UTC
1 point
0
I guess I’m considerably more optimistic on avoiding AI takeover without humans understanding what the models are thinking. (Or possibly you’re more optimistic about slowing down AI)
- Noosphere89 12 Feb 2023 1:44 UTC
  1 point
  1
  Parent
  
  I guess I’m considerably more optimistic on avoiding AI takeover without humans understanding what the models are thinking.
  
  Basically this. I am a lot more pessimistic around black box alignment than I am around white box alignment.
  - ryan_greenblatt 12 Feb 2023 1:48 UTC
    1 point
    0
    Parent
    FWIW, white box alignment doesn’t imply humans understand what the models are thinking. There are other ways to leverage the fact that we have access to the internals.
    - Noosphere89 12 Feb 2023 2:08 UTC
      1 point
      0
      Parent
      I was using it as a synonym for alignment with interpretability compared to without interpretability.