It seems to me like asking too much, to think that there won’t be shared natural ontologies between humans (construed broadly) and ML models but we can still make sure that with the right pretraining regiment/dataset choice/etc the model will end up with a human ontology and also this process is something that admits any amount of error and also this can be done in a way that’s not trivially jailbreakable.
why do you think it’s unlikely?
It seems to me like asking too much, to think that there won’t be shared natural ontologies between humans (construed broadly) and ML models but we can still make sure that with the right pretraining regiment/dataset choice/etc the model will end up with a human ontology and also this process is something that admits any amount of error and also this can be done in a way that’s not trivially jailbreakable.