adamShimi comments on Research agenda update

adamShimi 11 Aug 2021 20:09 UTC
LW: 4 AF: 2
AF
Okay, so we have a crux in “putting ourselves in the place of X isn’t a convergent subgoals”. I need to think about it, but I think I recall animal cognition experiments which tested (positively) something like that in… crows? (and maybe other animals).
- Steven Byrnes 12 Aug 2021 1:33 UTC
  LW: 4 AF: 2
  AF Parent
  Oh, I was thinking of the more specific mental operation “if it’s undesirable for Alice to deceive Bob, then it’s undesirable for me to deceive Bob (and/or it’s undesirable for me to be deceived by Alice)”. So we’re not just talking about understanding things from someone’s perspective, we’re talking about changing your goals as a result. Anything that involves changing your goals is almost definitely not a convergent instrumental subgoal, in my view.
  Example: Maybe I think it’s good for spiders to eat flies (let’s say for the sake of argument), and I can put myself in the shoes of a spider trying to eat flies, but doing that doesn’t make me want to eat flies myself.
  - adamShimi 12 Aug 2021 11:08 UTC
    LW: 4 AF: 2
    AF Parent
    Yeah, that’s fair. Your example shows really nicely how you would not want to apply rules/reasons/incentives you derived to spiders to yourself. That also work with more straightforward agents, as most AIs wouldn’t want to eat ice cream from seeing me eat some and enjoy it.