Garrett Baker comments on TurnTrout’s shortform feed

Garrett Baker 2 Dec 2022 4:00 UTC
LW: 1 AF: 1
AF
Seems possibly relevant & optimistic when seeing deception as a value. It has the form ‘if about to tell human statement with properties x, y, z, don’t’ too.
- TurnTrout 14 Dec 2022 7:39 UTC
  LW: 2 AF: 2
  AF Parent
  It can still be robustly derived as an instrumental subgoal during general-planning/problem-solving, though?
  - Garrett Baker 14 Dec 2022 8:32 UTC
    LW: 1 AF: 1
    AF Parent
    This is true, but indicates a radically different stage in training in which we should find deception compared to deception being an intrinsic value. It also possibly expands the kinds of reinforcement schedules we may want to use compared to the worlds where deception crops up at the earliest opportunity (though pseudo-deception may occur, where behaviors correlated with successful deception are reinforced possibly?).
    - TurnTrout 15 Dec 2022 3:53 UTC
      LW: 2 AF: 2
      AF Parent
      Oh, huh, I had cached the impression that deception would be derived, not intrinsic-value status. Interesting.