Sure. Humans have a sort of pessimism about their own abilities that’s fairly self-contradictory.
“My reasoning process might not be right”, interpreted as you do in the post, includes a standard of rightness that one could figure out. It seems like you could just… do the best thing, especially if you’re a self-modifying AI. Even if you have unresolvable uncertainty about what is right, you can just average over that uncertainty and take the highest-expected-rightness action.
Humans seem to remain pessimistic despite this by evaluating rightness using inconsistent heuristics, and not having enough processing power to cause too much trouble by smashing those heuristics together. I’m not convinced this is something we want to put into an AI. I guess I’m also more of an optimist about the chances to just do value learning well enough.
(Below is my response to my best understanding of your reply – let me know if you were trying to make a different point)
it can be simultaneously true that: ideal intent-aligned reasoners could just execute the expected-best policy, and that overcoming bias generally involves assessing the performance of your algorithm in a given situation, and also that it’s profitable to think about that aspect explicitly wrt corrigibility. So, I think I agree with you, but I’m interested in the heuristics that corrigible reasoning might tend to use?
Sure. Humans have a sort of pessimism about their own abilities that’s fairly self-contradictory.
“My reasoning process might not be right”, interpreted as you do in the post, includes a standard of rightness that one could figure out. It seems like you could just… do the best thing, especially if you’re a self-modifying AI. Even if you have unresolvable uncertainty about what is right, you can just average over that uncertainty and take the highest-expected-rightness action.
Humans seem to remain pessimistic despite this by evaluating rightness using inconsistent heuristics, and not having enough processing power to cause too much trouble by smashing those heuristics together. I’m not convinced this is something we want to put into an AI. I guess I’m also more of an optimist about the chances to just do value learning well enough.
(Below is my response to my best understanding of your reply – let me know if you were trying to make a different point)
it can be simultaneously true that: ideal intent-aligned reasoners could just execute the expected-best policy, and that overcoming bias generally involves assessing the performance of your algorithm in a given situation, and also that it’s profitable to think about that aspect explicitly wrt corrigibility. So, I think I agree with you, but I’m interested in the heuristics that corrigible reasoning might tend to use?