This branch of research is aimed at finding a (nearly) objective way of thinking about the universe. When I imagine the end result, I imagine something that receives a distribution across a bunch of data, and finds a bunch of useful patterns within it. At the moment that looks like finding patterns in data via find_natural_latent(get_chunks_of_data(data_distribution)) or perhaps showing that find_top_n(n, (chunks, natural_latent(chunks)) for chunks in all_chunked_subsets_of_data(data_distribution), key=lambda chunks, latent: usefulness_metric(latent)) is a (convergent sub)goal of agents. As such, the notion that the donuts’ data is simply poorly chunked—which needs to be solved anyway—makes a lot of sense to me.
I don’t know how to think about the possibilities when it comes to decomposing Xi. Why would it always be possible to decompose random variables to allow for a natural latent? Do you have an easy example of this? Also, what do you mean by mutual information between Xi, given that there are at least 3 of them? And why would just extracting said mutual information be useless? If you get the chance to point me towards good resources about any of these questions, that would be great.
Regarding chunking: a background assumption for me is that the causal structure of the world yields a natural chunking, with each chunk taking up a little local “voxel” of spacetime.
Some amount spacetime-induced chunking is “forced upon” an embedded agent, in some sense, since their sensors and actuators are localized in spacetime.
Now, there’s still degrees of freedom in taking more or less coarse-grained chunkings, and more or less coarse-graining differentially along different spacetime directions or in different places. But I expect that spacetime locality mostly nails down what we need as a starting point for convergent chunking.
Why would it always be possible to decompose random variables to allow for a natural latent?
Well, I suppose I overstated it a bit by saying “always”; you can certainly imagine artificial setups where the mutual information between a bunch of variables is zero. In practice, however, everything in the world is correlated with everything else, so in a real-world setting you’ll likely find such a decomposition always, or almost always.
And why would just extracting said mutual information be useless?
Well, not useless as such – it’s a useful formalism – but it would basically skip everything John and David’s post is describing. Crucially, it won’t uniquely determine whether a specific set of objects represents a well-abstracting category.
The abstraction-finding algorithm should be able to successfully abstract over data if and only if the underlying data actually correspond to some abstraction. If it can abstract over anything, however – any arbitrary bunch of objects – then whatever it is doing, it’s not finding “abstractions”. It may still be useful, but it’s not what we’re looking for here.
Concrete example: if we feed our algorithm 1000 examples of trees, it should output the “tree” abstraction. If we feed our algorithm 200 examples each of car tires, trees, hydrogen atoms, wallpapers, and continental-philosophy papers, it shouldn’t actually find some abstraction which all of these objects are instances of. But as per the everything-is-correlated argument above, they likely have non-zero mutual information, so the naive “find a decomposition for which there’s a natural latent” algorithm would fail to output nothing.
More broadly: We’re looking for a “true name” of abstractions, and mutual information is sort-of related, but also clearly not precisely it.
This branch of research is aimed at finding a (nearly) objective way of thinking about the universe. When I imagine the end result, I imagine something that receives a distribution across a bunch of data, and finds a bunch of useful patterns within it. At the moment that looks like finding patterns in data via
find_natural_latent(get_chunks_of_data(data_distribution))
or perhaps showing that
find_top_n(n, (chunks, natural_latent(chunks)) for chunks in
all_chunked_subsets_of_data(data_distribution),
key=lambda chunks, latent: usefulness_metric(latent))
is a (convergent sub)goal of agents. As such, the notion that the donuts’ data is simply poorly chunked—which needs to be solved anyway—makes a lot of sense to me.
I don’t know how to think about the possibilities when it comes to decomposing Xi. Why would it always be possible to decompose random variables to allow for a natural latent? Do you have an easy example of this?
Also, what do you mean by mutual information between Xi, given that there are at least 3 of them? And why would just extracting said mutual information be useless?
If you get the chance to point me towards good resources about any of these questions, that would be great.
Regarding chunking: a background assumption for me is that the causal structure of the world yields a natural chunking, with each chunk taking up a little local “voxel” of spacetime.
Some amount spacetime-induced chunking is “forced upon” an embedded agent, in some sense, since their sensors and actuators are localized in spacetime.
Now, there’s still degrees of freedom in taking more or less coarse-grained chunkings, and more or less coarse-graining differentially along different spacetime directions or in different places. But I expect that spacetime locality mostly nails down what we need as a starting point for convergent chunking.
You can generalize mutual information to N variables: interaction information.
Well, I suppose I overstated it a bit by saying “always”; you can certainly imagine artificial setups where the mutual information between a bunch of variables is zero. In practice, however, everything in the world is correlated with everything else, so in a real-world setting you’ll likely find such a decomposition always, or almost always.
Well, not useless as such – it’s a useful formalism – but it would basically skip everything John and David’s post is describing. Crucially, it won’t uniquely determine whether a specific set of objects represents a well-abstracting category.
The abstraction-finding algorithm should be able to successfully abstract over data if and only if the underlying data actually correspond to some abstraction. If it can abstract over anything, however – any arbitrary bunch of objects – then whatever it is doing, it’s not finding “abstractions”. It may still be useful, but it’s not what we’re looking for here.
Concrete example: if we feed our algorithm 1000 examples of trees, it should output the “tree” abstraction. If we feed our algorithm 200 examples each of car tires, trees, hydrogen atoms, wallpapers, and continental-philosophy papers, it shouldn’t actually find some abstraction which all of these objects are instances of. But as per the everything-is-correlated argument above, they likely have non-zero mutual information, so the naive “find a decomposition for which there’s a natural latent” algorithm would fail to output nothing.
More broadly: We’re looking for a “true name” of abstractions, and mutual information is sort-of related, but also clearly not precisely it.