avturchin comments on Techniques for optimizing worst-case performance

avturchin 28 Jan 2019 22:09 UTC
1 point
I think some other approaches also could be in the direction listed in this post:
1) Active boxing or catching treacherous turn: one AI observing behaviour of another AI and predicting when it start to fails.
2) AI tripling: three very similar AI works (independently) on the same problem, and if one of them sufficiently divergent from two others, it turn offs.
- paulfchristiano 28 Jan 2019 22:40 UTC
  2 points
  Parent
  I agree that you probably need ensembling in addition to these techniques.
  At best this technique would produce a system which has a small probability of unacceptable behavior for any input. You’d then need to combine multiple of those to get a system with negligible probability of unacceptable behavior.
  I expect you often get this for free, since catastrophe either involves a bunch of different AI systems behaving unacceptably, or a single AI behaving consistently unacceptably across time.