Gerald Monroe comments on Half-baked AI Safety ideas thread

Gerald Monroe 25 Jun 2022 6:34 UTC
3 points
To be specific to a “toy model”.
AI has a goal: collect stamps/build paperclips.
A deliberately easy to hack system is physically adjacent that tracks the AI’s reward. Say it has a no password shell and is accessible via IP.
AI becomes too smart, and hacks itself so it now has infinite reward and it has a clock register it can tamper with so it believes infinite time has already passed.
AI is now dead. Since no action it can take beats infinite reward it does nothing more. Sorta like a heroin overdose.
- Evan R. Murphy 17 Jul 2022 8:03 UTC
  1 point
  Parent
  
  AI is now dead. Since no action it can take beats infinite reward it does nothing more. Sorta like a heroin overdose
  
  Just watch out for an AI that is optimizing for long-term reward. If it wants to protect its infinite reward fountain then the AI would be incentivized to neutralize any possible threats to that situation (e.g. humans).