Pattern comments on Corrigibility as outside view

Pattern 9 May 2020 3:36 UTC
LW: 5 AF: 2
AF
(The object which is not the object:)
So you just don’t do it, even though it feels like a good idea.
More likely people don’t do it because they can’t, or a similar reason. (The point of saying “My life would be better if I was in charge of the world” is not to serve as a hypothesis, to be falsified.)
(The object:)
Beliefs intervene on action. (Not success, but choice.)

We are biased and corrupted. By taking the outside view on how our own algorithm performs in a given situation, we can adjust accordingly.
The piece seems biased towards the negative.
Calibrate yourself on the flaws of your own algorithm, and repair or minimize them.
Something like ‘performance’ seems more key than “flaws”. Flaws can be improved, but so can working parts.

And the AI knows its own algorithm.
An interesting premise. Arguably, if human brains are NGI, this would be a difference between AGI and NGI, which might require justification.
If I’m about to wipe my boss’s computer because I’m so super duper sure that my boss wants me to do it, I can consult OutsideView
and realize that I’m usually horribly wrong about what my boss wants in this situation. I don’t do it.
The premise of “inadequacy” saturates this post.* At best this post characterizes the idea that “not doing bad things” stems from “recognizing them as bad”—probabilistically, via past experience policy wise (phrased in language suggestive of priors), etc. This sweeps the problem under the rug in favor of “experience” and ‘recognizing similar situations’. [1]
In particular, calibrated deference would avoid the problem of fully updated deference.
“Irreversibility” seems relevant to making sure mistakes can be fixed, as does ‘experience’ in less high stake situations. Returning to the beginning of the post:
You run a country.
Hopefully you are “qualified”/experienced/etc. This is a high stakes situation.**

[1] OutsideView seems like it should be a (function of a) summary of the past, rather than a recursive call.

While reading this post...
- From an LW standpoint I wished it had more clarity.
- From an AF (Alignment Forum) view I appreciated it’s direction. (It seems like it might be pointed somewhere important.)
*In contrast to the usual calls for ‘maximizing’ “expected value”. While this point has been argued before, it seems to reflect an idea about how the world works (like a prior, or something learned).
**Ignoring the question of “what does it mean to run a country if you don’t set all the rules”, because that seems unrelated to this essay.
- TurnTrout 9 May 2020 15:04 UTC
  LW: 4 AF: 2
  AF Parent
  
  From an LW standpoint I wished it had more clarity. From an AF (Alignment Forum) view I appreciated it’s direction. (It seems like it might be pointed somewhere important.)
  
  Yeah, I feel a bit confused about this idea still (hence the lack of clarity), but i’m excited about it as a conceptual tool. I figured it would be better to get my current thoughts out there now, rather than to sit on the idea for two more years.