A different perspective, perhaps not motivating quite the same things as yours:
Embedded Reflective Consistency
A theory needs to be able to talk about itself and its position in and effect on the world. So in particular it will have beliefs about how the application of just this theory in just this position will influence whatever it is that we want the theory to do. Then reflective consistency demands that the theory rates itself well on its objective: If I have a belief, and also a belief that the first belief is most likely the result of deception, then clearly I have to change one of these.
Now, if there was a process that could get us there, then it would have to integrate process-level feedback, because the goal is a certain consistency with it. It would also give a result at all levels and sharing between them, because there is only one theory as output, which must judge its own activity (of which judging its own activity is part).
So far this looks like the theory process-level feedback, but I think it also has some things to say corrigibility as well. For one, we would put in a theory thats possibly not reflectively consistent, and get one out that is. The output space is much smaller than the input, and probably discrete. If we reasonably suppose that the mapping is continuous, then that means its resistant to small changes in input. Also, an embedded agent must see itself as made of potentially broken parts, so it must be able to revise any one thing.
A different perspective, perhaps not motivating quite the same things as yours:
Embedded Reflective Consistency
A theory needs to be able to talk about itself and its position in and effect on the world. So in particular it will have beliefs about how the application of just this theory in just this position will influence whatever it is that we want the theory to do. Then reflective consistency demands that the theory rates itself well on its objective: If I have a belief, and also a belief that the first belief is most likely the result of deception, then clearly I have to change one of these.
Now, if there was a process that could get us there, then it would have to integrate process-level feedback, because the goal is a certain consistency with it. It would also give a result at all levels and sharing between them, because there is only one theory as output, which must judge its own activity (of which judging its own activity is part).
So far this looks like the theory process-level feedback, but I think it also has some things to say corrigibility as well. For one, we would put in a theory thats possibly not reflectively consistent, and get one out that is. The output space is much smaller than the input, and probably discrete. If we reasonably suppose that the mapping is continuous, then that means its resistant to small changes in input. Also, an embedded agent must see itself as made of potentially broken parts, so it must be able to revise any one thing.