I don’t understand Model-Utility Learning (MUL) section, what pathological behavior does AI do?
Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.
So it’s like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that “playing piano” means “playing piano in a green room” or “playing piano in a room which would be chosen for training me in the past”?
Now, we might reasonably expect that if the AI considers a novel way of “fooling itself” which hasn’t been given in a training example, it will reject such things for the right reasons: the plan does not involve physically building a bridge.
But “sensory data being a certain way” is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn’t guarantee to solve misgeneralization in any way?
If the answer to my questions is “yes”, what did we even hope for with MUL?
I don’t understand Model-Utility Learning (MUL) section, what pathological behavior does AI do?
So it’s like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that “playing piano” means “playing piano in a green room” or “playing piano in a room which would be chosen for training me in the past”?
But “sensory data being a certain way” is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn’t guarantee to solve misgeneralization in any way?
If the answer to my questions is “yes”, what did we even hope for with MUL?