Vladimir_Nesov comments on Some thoughts on AI, Philosophy, and Safety

Vladimir_Nesov 22 Dec 2011 2:37 UTC
3 points
I sure hope you have a tendency to eventually converge to something that makes sense to me… Do you agree that what you post there is the product of an “initial exploration” phase that would get significantly revised and mostly discarded on the scale of months? (I had a blog just 1.5 years ago that I currently see this way, but didn’t at the time...)
- Wei Dai 24 Apr 2012 5:21 UTC
  5 points
  Parent
  Have you seen Paul’s latest post yet? It seems much more well formed than his previous posts on the subject.
  
  I left a comment there, but it’s still under moderation, so I’ll copy it here.
  
  For example, if we suppose that the U-maximizer can carry out any reasoning that we can carry out, then the U-maximizer knows to avoid anything which we suspect would be bad according to U (for example, torturing humans).
  
  This seems like a problematic part of the argument. The reason we think torturing humans would be bad according to U is that we have an informal model of humans in our mind, and we know that U is actually a simulation of something that contains a human. Our “suspicion” does not come from studying U as a mathematical object, which is presumably all that a U-maximizer would do, since all it has is a formal definition of U and not our informal knowledge of it.
  What links here?
  - Wei Dai's comment on Formalizing Value Extrapolation by paulfchristiano (2 May 2012 21:12 UTC; 4 points)
  - Vladimir_Nesov 24 Apr 2012 15:39 UTC
    5 points
    Parent
    
    Have you seen Paul’s latest post yet? It seems much more well formed than his previous posts on the subject.
    
    I agree, though it doesn’t go as far afield as many of the other posts. It’s actually another plausible winning scenario that I forgot about in the recent discussions: implement WBE via AGI (as opposed to normal engineering route, thus winning the WBE race), and then solve the remaining problems from within. Might be possible to implement when the FAI puzzle is not yet solved completely.
    What links here?
    Vladimir_Nesov's comment on Formalizing Value Extrapolation by paulfchristiano (26 Apr 2012 8:26 UTC; 9 points)
- paulfchristiano 22 Dec 2011 7:34 UTC
  5 points
  Parent
  Could you clarify your remarks? This seems to be a source of persistent mild disagreement, but I’m not really sure what it is. I am aware of some inferential distance between us on what seem to me to be technical aspects of decision theory, but your comments either require some misunderstanding or some other not-yet-identified inferential chasm.
  
  In the future I expect to feel basically the same way about this writing, particularly the stuff in the category “Formal Definitions,” as I do today about these posts: not safe for use, but important for someone to think about and describe, if only to see more exactly why they are dangerous approaches. I expect the formal assertions I’ve made, to the extent I’ve made formal assertions, to continue to look reasonable. I am open to the possibility that there may be surprises.
  
  (For example, when making those old LW posts I didn’t yet understand exactly how weird TDT agents’ behavior might look to someone used to thinking in terms of CDT, so while the cryptographic boxing stuff still holds up fine the manipulation of boxed AIs doesn’t; the new work can be expressed with much less wiggle room, but there may still be surprises. )