whpearson comments on Q&A with Michael Littman on risks from AI

whpearson 19 Dec 2011 22:37 UTC
7 points
Lots of Machine Learning programs have parameters set to certain values because they seem to work well (e.g. update rates on peceptrons). Perhaps he is extrapolating that into full AI. So the blueprint would be strewn with comments like “Set complexity threshold to attributing external changes to volitional agents to 0.782. Any higher and the agent believes humans aren’t intelligent and tries (and fails) to predict them from first principles rather than the intentional stance. Any lower and the agent believes rocks are intelligent and just want to stay still. Also this interferes with learning rate alpha for unknown reasons”.

So experimentation with different variants of values might take significant time to evaluate their efficacy (especially if you have to raise from a baby each time).

I’m also guessing that Michael doesn’t think that AI’s are likely to be malicious and write malware to run experiments in the darknet :)
- Kaj_Sotala 20 Dec 2011 14:39 UTC
  4 points
  Parent
  This is the reason why I’m more worried about hardware overhang than recursive self-improvement. Currently known learning algorithms seem to all have various parameters like that whose right value you can’t know a priori—you have to experiment to find out. And when setting parameter 420 to .53 gives you a different result than setting it to .48, you don’t necessarily know which result is more correct, either. You need some external way of verifying the results, and you need to be careful that you are still interpreting the external data correctly and didn’t just self-modify yourself to go insane. (You can test yourself on data you’ve generated yourself, and where you know the correct answers, but that doesn’t yet show that you’ll process real-world data correctly.)
  
  My current intuition suggests that general intelligence is horribly fragile, in the sense that it’s an extremely narrow slice of mindspace that produces designs that actually reason correctly. Just like with humans, if you begin to tamper with your own mind, you’re most likely to do damage if you don’t know what you’re doing—and evolution has had time to make our minds quite robust in comparison.
  
  That isn’t to say that an AGI couldn’t RSI itself to godhood in a relatively quick time, especially if it had humans scientists helping it out. Also, like cousin_it pointed out, you don’t necessarily need superintelligence to destroy humanity. But the five year estimate doesn’t strike me as unreasonable.
  
  What I suspect—and hope, since it might give humanity a chance—to happen is that some AGI will begin a world-takeover attempt, but then fail due to some epistemic equivalent of a divide-by-zero error, falling prey to Pascal’s mugging or something.
  
  Then again, it might fail, but only after having destroyed humans while in the process.
  - whpearson 21 Dec 2011 0:37 UTC
    4 points
    Parent
    I’ve thought about scenarios of failed RSIs. My favorite is an idiot savant computer hacking AI that subsumes the entire Internet but has no conception of the real world. So we just power off, reformat and need to think carefully about how we make computers and how to control AI.
    
    But I’ve really no concrete reason to expect this scenario to play out. I expect the nature of intelligence to throw us some more conceptual curve balls before we have an inkling of where we are headed and how to best steer the future.
  - Emile 20 Dec 2011 16:52 UTC
    1 point
    Parent
    
    You need some external way of verifying the results, and you need to be careful that you are still interpreting the external data correctly and didn’t just self-modify yourself to go insane. (You can test yourself on data you’ve generated yourself, and where you know the correct answers, but that doesn’t yet show that you’ll process real-world data correctly.)
    
    If I was an AI in such a situation, I’d make a modified copy of myself (or of the relevant modules) interfaced with a simulation environment with some physics-based puzzle to solve, such that it only gets a video feed and only has some simple controls (say, have it play Portal—the exact challenge is a bit irrelevant, just something that requires general intelligence). A modified AI that performs better (learns faster, comes up with better solutions) in a wide variety of simulated environments will probably also work better in the real world.
    
    Even if the combinations of parameters that makes functional intelligence is very fragile, i.e. the search space has high-dimensionality and the “surface” is very jagged, it’s still a search space that can be explored and mapped.
    
    That’s a bit hand-wavy, but enough to get me to suspect that an agent that can self-modify and run simulations of itself has a non-negligible chance of self-improving successfully (for a broad meaning of “successfully”, that includes accidentally rewriting the utility function, as long as the resulting system is more powerful).
    
    But the five year estimate doesn’t strike me as unreasonable.
    
    Meaning, a 1% chance of superhuman intelligence within 5 years, right?
    - Kaj_Sotala 20 Dec 2011 18:29 UTC
      0 points
      Parent
      
      Meaning, a 1% chance of superhuman intelligence within 5 years, right?
      
      Sorry, I meant to say that it does not seem unreasonable to me that an AGI might take five years to self-improve. 1% does seem unreasonably low. I’m not sure what probability I would assign to “superhuman AGI in 5 years”, but under say 40% seems quite low.