Short answer: do online learning with an additional action called “query programmer” that is guaranteed to always have some small negative utility, say −0.001, that is enough to outweigh any non-trivial amount of uncertainty but will eventually encourage the AI to act autonomously.
This short answer is too short for me to understand, unfortunately. Do you think there is enough potential merit in this idea to try to understand it better or further develop it? (I’ve been learning about online learning recently in an effort to understand/evaluate Paul Christiano’s recent “AI control” ideas. If you have your own ideas also based on online learning, I’d love to try to understand them while the online learning stuff is fresh in my mind.)
This short answer is too short for me to understand, unfortunately. Do you think there is enough potential merit in this idea to try to understand it better or further develop it? (I’ve been learning about online learning recently in an effort to understand/evaluate Paul Christiano’s recent “AI control” ideas. If you have your own ideas also based on online learning, I’d love to try to understand them while the online learning stuff is fresh in my mind.)
Here is a control idea based on online learning—I think I independently generated something similar to what Jacob describes.