In the case of humans, it seems like there’s some correlation between “feeling surprised and confused by something” vs “model refinement”, and likewise some correlation between “feeling torn” and “reward function splintering”. Do you agree? Or if not, what are examples where those come apart?
If so, that would be a good sign that we can actually incorporate something like this in a practical AGI. :-)
Also, if this is on the right track, then I guess a corresponding intuitive argument would be: If we have a human personal assistant, then we would want them to act conservatively, ask for help, etc., in situations where they feel surprised and confused by what they observe, and/or situations where they feel torn about what to do next. Therefore we should try to instill a similar behavior in AGIs. I like that intuitive argument—it feels very compelling to me.
Without having thought too hard about it …
In the case of humans, it seems like there’s some correlation between “feeling surprised and confused by something” vs “model refinement”, and likewise some correlation between “feeling torn” and “reward function splintering”. Do you agree? Or if not, what are examples where those come apart?
If so, that would be a good sign that we can actually incorporate something like this in a practical AGI. :-)
Also, if this is on the right track, then I guess a corresponding intuitive argument would be: If we have a human personal assistant, then we would want them to act conservatively, ask for help, etc., in situations where they feel surprised and confused by what they observe, and/or situations where they feel torn about what to do next. Therefore we should try to instill a similar behavior in AGIs. I like that intuitive argument—it feels very compelling to me.