Wei Dai comments on Reply to Holden on ‘Tool AI’

Wei Dai 14 Jun 2012 2:55 UTC
8 points

The intent was to add a hack that throws consistency to the wind, and observe that the AI doesn’t rebel against the hack.

Why doesn’t the AI reason “if I remove this hack, I’ll be more likely to win?” Because this is just a narrow chess AI and the programmer never gave it general reasoning abilities?
- private_messaging 26 Jun 2012 10:45 UTC
  2 points
  Parent
  
  Why doesn’t the AI reason “if I remove this hack, I’ll be more likely to win?”
  
  More interesting question is why it (if made capable of such reflection) would not take it a little step further and ponder what happens if it removes enemy’s queen from it’s internal board, which would also make it more likely to win, with its internal definition of win which is defined in terms of internal board.
  
  Or why would anyone go through the bother of implementing possibly irreducible notion of what ‘win’ really means in the real world, given that this would simultaneously waste computing power on unnecessary explorations and make AI dangerous / uncontrollable.
  
  Thing is, you don’t need to imagine the world dying to avoid making pointless likely impossible accomplishments.
- cousin_it 14 Jun 2012 7:39 UTC
  0 points
  Parent
  Yeah, because it’s just a narrow real-world AI without philosophical tendencies… I’m actually not sure. A more precise argument would help, something like “all sufficiently powerful AIs will try to become or create consistent maximizers of expected utility, for such-and-such reasons”.
  - Vladimir_Nesov 14 Jun 2012 8:19 UTC
    6 points
    Parent
    Does a pair of consistent optimizers with different goals have a tendency to become a consistent optimizer?
    
    The problem with powerful non-optimizers seems to be that the “powerful” property already presupposes optimization power, and so at least one optimizer-like thing is present in the system. If it’s powerful enough and is not contained, it’s going to eat all the other tendencies of its environment, and so optimization for its goal will be all that remains. Unless there is another optimizer able to defend its non-conformity from the optimizer in question, in which case the two of them might constitute what counts as not-a-consistent-optimizer, maybe?