jsteinhardt comments on Tiling Agents for Self-Modifying AI (OPFAI #2)

jsteinhardt 7 Jun 2013 14:30 UTC
2 points
It would be useful to understand why we think a calculator doesn’t “count” as self-modification. In particular, we don’t think calculators run into the Lob obstacle, so what is the difference between calculators and AIs?
- Kawoomba 7 Jun 2013 15:41 UTC
  0 points
  Parent
  As always in such matters, think of Turing Machines. If the transition function isn’t modified, the state of the Turing Machine may change. However, it’ll always be in a internal state prespecified in its transition function, it won’t get unknown or unknowable new entries in its action table.
  
  Universal Turing Machines are designed to change, to take their transition function from the input tape as input, a prime example of self-modification. But they as well—having read their new transition function from their input tape—will go along their business as usual without further changes to their transition function. (You can of course program them to later continue changing their action table, but the point is that such changes to its own action table—to its own behavior—are clearly delineated from just contents in its memory / work tape.)
  
  A calculator or a non-self-modifying AI will undergo changes in its memory, but it’ll never endeavor to define new internal states, with new rules, on its own. It’ll memorize whether you’ve entered “0.7734” in its display, but it’ll only perform its usual actions on that number. A game of tetris will change what blocks it displays on your screen, but that won’t modify its rules.
  
  There may be accidental modifications (bugs etc.) leading to unknown states and behavior, but I wouldn’t usefully call that an active act of self-modification. (It’s not a special case to guard against, other than by the usual redundancy / using checksums. But that’s no more FAI research than rather the same constraints as when working with e.g. real time or mission critical applications.)
  - philh 7 Jun 2013 23:24 UTC
    3 points
    Parent
    I don’t think this is quite there. A UTM is itself a TM, and its transition function is fixed. But it emulates a TM, and it could instead emulate a TM-with-variable-transition-function, and that thing would be self-modifying in a deeper sense than an emulation of a standard TM.
    
    But it’s still not obvious to me how to formalize this, because (among other problems) you can replace an emulated TMWVTF with an emulated UTM which in turn emulates a TMWVTF...