Tom Davidson comments on Richard Ngo’s Shortform

Tom Davidson 5 Jan 2023 17:33 UTC
1 point
I think the “soup of heuristics” stories (where the AI is optimizing something far causally upstream of reward instead of something that is downstream or close enough to be robustly correlated) don’t lead to takeover in the same way
Why does it not lead to takeover in the same way?
- paulfchristiano 5 Jan 2023 18:47 UTC
  2 points
  Parent
  Because it’s easy to detect and correct (except that correcting it might push you into one of the other regimes).
  - Tom Davidson 7 Jan 2023 0:12 UTC
    1 point
    Parent
    So far causally upstream of the human evaluator’s opinion? Eg an AI counselor optimizing for getting to know you