I want to provide feedback, but can’t see the actual definition of the objective function in either of the cases. Can you write down a sketch of how this would be implemented using existing primitives (SI, ML) so I can argue against what you’re really intending?
Some preliminary thoughts:
Curiosity (obtaining information) is an instrumental goal, so I’m not sure if making it more important will produce more or less aligned systems. How will you trade off curiosity and satisfaction of human values?
It’s difficult to specify correctly—depending on what you mean by curiosity, the AI can start showing itself random unpredictable data (the Noisy TV problem for RL curiosity) or manipulating humans to give it self-affirming instructions (if the goal is to minimize prediction error). So far I haven’t seen a solution that isn’t a hack that won’t generalize to superintelligent agents.
I want to provide feedback, but can’t see the actual definition of the objective function in either of the cases. Can you write down a sketch of how this would be implemented using existing primitives (SI, ML) so I can argue against what you’re really intending?
Some preliminary thoughts:
Curiosity (obtaining information) is an instrumental goal, so I’m not sure if making it more important will produce more or less aligned systems. How will you trade off curiosity and satisfaction of human values?
It’s difficult to specify correctly—depending on what you mean by curiosity, the AI can start showing itself random unpredictable data (the Noisy TV problem for RL curiosity) or manipulating humans to give it self-affirming instructions (if the goal is to minimize prediction error). So far I haven’t seen a solution that isn’t a hack that won’t generalize to superintelligent agents.
Good point. Will have to think on this more. What do you mean by (SI, ML)? Can you link me to articles around this so I can define this better?
Solomonoff Induction and Machine Learning. How would you formulate this in terms of a machine that can only predict future observations?