Stuart_Armstrong comments on Assuming we’ve solved X, could we do Y...

Stuart_Armstrong 12 Dec 2018 20:55 UTC
2 points
Agree that it’s useful to disentangle them, but it’s also useful to realise that they can’t be fully disentangled… yet.
- David Scott Krueger (formerly: capybaralet) 17 Dec 2018 4:43 UTC
  3 points
  Parent
  I actually don’t understand why you say they can’t be fully disentangled.
  IIRC, it seemed to me during the discussion that your main objection was around whether (e.g.) “arbitrarily long deliberation (ALD)” was (or could be) fully specified in a way that accounts properly for things like deception, manipulation, etc. More concretely, I think you mentioned the possibility of an AI affecting the deliberation process in an undesirable way.
  But I think it’s reasonable to assume (within the bounds of a discussion) that there is a non-terrible way (in principle) to specify things like “manipulation”. So do you disagree? Or is your objection something else entirely?
  What links here?
  - How much can value learning be disentangled? by Stuart_Armstrong (29 Jan 2019 14:17 UTC; 22 points)
  - Stuart_Armstrong 29 Jan 2019 14:17 UTC
    3 points
    Parent
    Hey there!
    
    Given a longer answer here: https://www.lesswrong.com/posts/Q7WiHdSSShkNsgDpa/how-much-can-value-learning-be-disentangled