TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 26 Jul 2020 18:33 UTC
LW: 8 AF: 2
AF
An additional consideration for early work on interpretability: it slightly increases the chance we actually get an early warning shot. If a system misbehaves, we can inspect its cognition and (hopefully) find hints of intentional deception. Could motivate thousands of additional researcher-hours being put into alignment.
- Raemon 26 Jul 2020 21:20 UTC
  LW: 2 AF: 1
  AF Parent
  That’s an interesting point.