michaelcohen comments on Formal Solution to the Inner Alignment Problem

michaelcohen 19 Feb 2021 11:38 UTC
LW: 6 AF: 4
0
AF
In this case, at every timestep we take the N most probable models, and only take an action a with probability p if **every** one of the N models takes that action with at least probability p.
This is so much clearer than I’ve ever put it.
(There’s a specific rule that ensures that N decreases over time.)
N won’t necessarily decrease over time, but all of the models will eventually agree with other.
monitor the performance of your system online, and train to correct any problems
I would have described Vanessa’s and my approaches as more about monitoring uncertainty, and avoiding problems before the fact rather than correcting them afterward. But I think what you said stands too.
- Rohin Shah 19 Feb 2021 17:59 UTC
  LW: 4 AF: 3
  0
  AF Parent
  N won’t necessarily decrease over time, but all of the models will eventually agree with other.
  Ah, right. I rewrote that paragraph, getting rid of that sentence and instead talking about the tradeoff directly.
  I would have described Vanessa’s and my approaches as more about monitoring uncertainty, and avoiding problems before the fact rather than correcting them afterward. But I think what you said stands too.
  Added a sentence to the opinion noting the benefits of explicitly quantified uncertainty.