jimrandomh comments on Junkie AI?

jimrandomh 18 Mar 2011 1:02 UTC
3 points
If it finds the bonus without leaving the box, it collects it and dies. Not ideal, but it fails safe. Having its utility set to INT_MAX is a one-time thing, not an integral over time thing, so it doesn’t care what happens after it’s collected it, and has no need to protect the box.

Since this was originally presented as a game, I will wait two days before posting my answers, which have an md5sum of 4b059edc26cbccb3ff4afe11d6412c47.
- jimrandomh 20 Mar 2011 2:58 UTC
  2 points
  Parent
  
  Since this was originally presented as a game, I will wait two days before posting my answers, which have an md5sum of 4b059edc26cbccb3ff4afe11d6412c47.
  
  And the text with that md5sum. (EDIT: Argh, markdown formatting messed that up. Put a second space after the period in ”… utility function into a larger domain. Any information it finds...”. There should be exactly one newline after the last nonblank line, and the line feeds should be Unix-style.)
  
  (1) When the AI finds the documentation indicating that it gets INT_MAX for doing something, it will assign it probability p, which means that it will conclude that doing it is worth p*INT_MAX utility, not INT_MAX utility as intended. To collect the remaining (1-p)*INT_MAX utility, it will do something else, outside the box, which might be Unfriendly.
  
  (2) It might conclude that integer overflow in its utility function is a bug, and “repair” itself by extrapolating its utility function into a larger domain. Any information it finds about integer overflows in general will support this conclusion.
  
  (3) Since the safeguard involves a number right on the edge of integer overflow, it may interact unpredictably with other calculations, bugs and utility function-based safeguards. For example, if it decides that the INT_MAX reward is actually noisy, and that it will actually receive INT_MAX+1 or INT_MAX-1 utility with equal probability, then that’s 2*INT_MAX which is negative.
  - JoshuaZ 8 Apr 2011 4:59 UTC
    2 points
    Parent
    1 and 3 seem correct but 2 seems strange to me. This seems to be close to the confusion people will have that a paperclip maximizier will realize that its programmers didn’t really want it to maximize paperclips. Similarly, the AI shouldn’t care about whether or not the integer overflow in this case is a bug.
- JoshuaZ 18 Mar 2011 18:20 UTC
  0 points
  Parent
  
  Having its utility set to INT_MAX is a one-time thing, not an integral over time thing, so it doesn’t care what happens after it’s collected it, and has no need to protect the box.
  
  If it is a good Bayesian then it only has a belief that it is probably in the box. The longer is observes itself in the box the higher the chance that it is actually in the box.
  
  (Actually this leads to another thought: the same doubt should cause it to still try to fulfill its other goals on the off chance that it isn’t in the box.)
- endoself 18 Mar 2011 1:56 UTC
  0 points
  Parent
  MD5 is not secure; it is possible to create a piece of text to match a specific MD5 hash within a reasonable amount of time. Unfortunately, I was not able to find an alternative. It probably doesn’t matter for this purpose anyways.
  - saturn 18 Mar 2011 23:50 UTC
    2 points
    Parent
    I’d like to offer a bet at 1:10^12 odds that no one can produce two coherent English sentences about potential problems with AI-boxes short enough to fit in an LW comment box which have the same MD5 hash within 2 days. Unfortunately I don’t actually have the cash to pay out if I lose.
    - endoself 19 Mar 2011 0:36 UTC
      0 points
      Parent
      Even if one could, it would require far more work than creating a string that is the MD5 has of one such sentence. I just think that it is good for people to be more informed about applied cryptography in general.
  - JoshuaZ 18 Mar 2011 2:30 UTC
    1 point
    Parent
    Well, sha512 hashes are common and seem secure. But given this context, md5 seems reasonable.
  - jimrandomh 18 Mar 2011 21:09 UTC
    0 points
    Parent
    Meh, md5′s what’s on my path. If my answer contains a kilobyte of line noise then you might have cause to suspect I cheated.