Rohin Shah comments on Coherence arguments do not entail goal-directed behavior

Rohin Shah 8 Dec 2019 5:20 UTC
LW: 6 AF: 4
AF
I pretty strongly agree with this review (and jtbc it was written without any input from me, even though Daniel and I are both at CHAI).
I think of ‘coherence arguments’ as including things like ‘it’s not possible for you to agree to give me a limitless number of dollars in return for nothing’, which does imply some degree of ‘goal-direction’.
Yeah, maybe I should say “coherence theorems” to be clearer about this? (Like, it isn’t a theorem that I shouldn’t give you limitless number of dollars in return for nothing; maybe I think that you are more capable than me and fully aligned with me, and so you’d do a better job with my money. Or maybe I value your happiness, and the best way to purchase it is to give you money no strings attached.)
Responses from outside this camp
Fwiw, I do in fact worry about goal-directedness, but (I think) I know what you mean. (For others, I think Daniel is referring to something like “the MIRI camp”, though that is also not an accurate pointer, and it is true that I am outside that camp.)
My responses to the questions:
1. The ones in Will humans build goal-directed agents?, but if you want arguments that aren’t about humans, then I don’t know.
2. Depends on the distribution over utility functions, the action space, etc, but e.g. if it uniformly selects a numeric reward value for each possible trajectory (state-action sequence) where the actions are low-level (e.g. human muscle control), astronomically low.
3. That will probably be a good model for some (many?) powerful AI systems that humans build.
4. I don’t know. (I think it depends quite strongly on the way in which we train powerful AI systems.)
5. Not likely at low levels of intelligence, plausible at higher levels of intelligence, but really the question is not specified enough.
What links here?
- Rohin Shah's comment on Coherence arguments do not entail goal-directed behavior by Rohin Shah (10 Jun 2022 5:47 UTC; 2 points)
- DanielFilan 8 Dec 2019 5:34 UTC
  LW: 6 AF: 4
  AF Parent
  
  it was written without any input from me
  
  Well, I didn’t consult you in the process of writing the review, but we’ve had many conversations on the topic which presumably have influenced how I think about the topic and what I ended up writing in the review.
- DanielFilan 8 Dec 2019 5:31 UTC
  LW: 4 AF: 2
  AF Parent
  
  I think of ‘coherence arguments’ as including things like ‘it’s not possible for you to agree to give me a limitless number of dollars in return for nothing’, which does imply some degree of ‘goal-direction’.
  
  Yeah, maybe I should say “coherence theorems” to be clearer about this?
  
  Sorry, I meant theorems taking ‘no limitless dollar sink’ as an axiom and deriving something interesting from that.