Find some internal signal/latent using whatever random methods someone pulled out their ass
Check whether it satisfies the naturality conditions (over some choice of variables)
… which is not what this post is about.
The material in this post is useful mainly in cases where we want to be able to rule out any “better” natural latents, which is a somewhat atypical use case. It would be relevant, for instance, if I want to design a toy environment with known natural latents in which to train some system.
(Aside: this is something I updated about relatively recently; I had previously thought of the sort of thing this post is doing as the central use-case.)
Would the checks of the naturality conditions you have in mind primarily be empirical (e.g. sampling a bunch of data points and running some statistical independence checks), or might they just as often be mechanistic (e.g. not sure how that would work for complex models like Llama but e.g. for a Bayes net you obviously already have a factorization that makes robust model independence checks much easier)?
Asking because the idea of “in some model” (plus the desire for e.g. adversarial robustness) suggests to me that we’d want to have a more mechanistic idea of whether the naturality conditions hold, but they seem easier to check empirically.
I’d be curious if you have any ideas for how it can be applied in more advanced cases, e.g. what if we want to find the natural latents in Llama?
I expect the typical case will look like:
Find some internal signal/latent using whatever random methods someone pulled out their ass
Check whether it satisfies the naturality conditions (over some choice of variables)
… which is not what this post is about.
The material in this post is useful mainly in cases where we want to be able to rule out any “better” natural latents, which is a somewhat atypical use case. It would be relevant, for instance, if I want to design a toy environment with known natural latents in which to train some system.
(Aside: this is something I updated about relatively recently; I had previously thought of the sort of thing this post is doing as the central use-case.)
Would the checks of the naturality conditions you have in mind primarily be empirical (e.g. sampling a bunch of data points and running some statistical independence checks), or might they just as often be mechanistic (e.g. not sure how that would work for complex models like Llama but e.g. for a Bayes net you obviously already have a factorization that makes robust model independence checks much easier)?
Asking because the idea of “in some model” (plus the desire for e.g. adversarial robustness) suggests to me that we’d want to have a more mechanistic idea of whether the naturality conditions hold, but they seem easier to check empirically.
That’s a big open question which we’re still figuring out.