Gerald Monroe comments on Half-baked AI Safety ideas thread

Gerald Monroe 25 Jun 2022 6:43 UTC
1 point
“out of distribution” detectors. I am not precisely certain how to implement one of these. I just notice that when we ask a language or art model to generate something from a prompt, or ask it to describe what it means by an “idea”, what it shows us is what it considers “in distribution” for that idea.
This implicitly means that a system could generate a set of outcomes for what it believes the real world will do in response to the machine’s own actions and when the real world outcomes start to diverge wildly from it’s predictions, this should reach a threshold where the AI should shut down.
Safety systems would kick in and these are either dumber AIs or conventional control systems to bring whatever the AI was controlling to a stop, or hand off control to a human.
- Evan R. Murphy 17 Jul 2022 8:37 UTC
  1 point
  Parent
  DeepMind has some work on out of distribution detection, for example: https://www.deepmind.com/publications/contrastive-training-for-improved-out-of-distribution-detection I haven’t looked very closely at it yet though.