QFT is the extreme example of a “better abstraction”, but in principle (if the natural abstraction hypothesis fails) there will be all sorts and shapes of abstractions, and some of them will be available to us, and some of them will be available to the model, and these sets will not fully overlap—which is a concern in worlds where different abstractions lead to different generalization properties.
Indeed. I think the key thing for me is, I expect the model to be strongly incentivized to have a solid translation layer from its internal ontology to e.g. English language, due to being trained on lots of English language data. Due to Occam’s Razor, I expect the internal ontology to be biased towards that of an English-language speaker.
It’s just that, if you feed enough data to a model that can hold entire swaths of the physical universe inside of its metaphorical “head”, pretty soon hypotheses that involve the actual state of that universe will begin to outperform hypotheses that don’t, and which instead use some kind of lossy approximation of that state involving intermediary concepts like “intent”, “belief”, “agent”, “subjective state”, etc.
I’m imagining something like: early in training the model makes use of those lossy approximations because they are a cheap/accessible way to improve its predictive accuracy. Later in training, assuming it’s being trained on the sort of gigantic scale that would allow it to hold swaths of the physical universe in its head, it loses those desired lossy abstractions due to catastrophic forgetting. Is that an OK way to operationalize your concern?
I’m still not convinced that this problem is a priority. It seems like a problem which will be encountered very late if ever, and will lead to ‘random’ failures on predicting future/counterfactual data in a way that’s fairly obvious.
Indeed. I think the key thing for me is, I expect the model to be strongly incentivized to have a solid translation layer from its internal ontology to e.g. English language, due to being trained on lots of English language data. Due to Occam’s Razor, I expect the internal ontology to be biased towards that of an English-language speaker.
I’m imagining something like: early in training the model makes use of those lossy approximations because they are a cheap/accessible way to improve its predictive accuracy. Later in training, assuming it’s being trained on the sort of gigantic scale that would allow it to hold swaths of the physical universe in its head, it loses those desired lossy abstractions due to catastrophic forgetting. Is that an OK way to operationalize your concern?
I’m still not convinced that this problem is a priority. It seems like a problem which will be encountered very late if ever, and will lead to ‘random’ failures on predicting future/counterfactual data in a way that’s fairly obvious.