I think there’s probably a continuous spectrum of usefulness of abstractions. All the way from actively unhelpful and confusing up to extremely helpful for a realistically compute & data limited real-world agent. Like, having the right abstractions enables this limited agent to do things and learn things it otherwise couldn’t do with it’s limited resources.
Being able to unlearn/overwrite/forget bad unhelpful abstractions and heuristics is probably a very useful ability.
My guess is that this is going to become an increasingly important and discussed area of research.
I was imagining humans deliberately designing and testing a training process that did this automatically. I haven’t thought about how to define, much less automatically detect, unhelpful abstractions or heuristics though.
I think there’s probably a continuous spectrum of usefulness of abstractions. All the way from actively unhelpful and confusing up to extremely helpful for a realistically compute & data limited real-world agent. Like, having the right abstractions enables this limited agent to do things and learn things it otherwise couldn’t do with it’s limited resources. Being able to unlearn/overwrite/forget bad unhelpful abstractions and heuristics is probably a very useful ability. My guess is that this is going to become an increasingly important and discussed area of research.
Are you imagining the training process doing this, or humans doing it after training?
I was imagining humans deliberately designing and testing a training process that did this automatically. I haven’t thought about how to define, much less automatically detect, unhelpful abstractions or heuristics though.