yrimon comments on Natural Latents: The Math

yrimon 1 Jan 2024 9:36 UTC
6 points
0
Let’s say every day at the office, we get three boxes of donuts, numbered 1, 2, and 3. I grab a donut from each box, plunk them down on napkins helpfully labeled X1, X2, and X3. The donuts vary in two aspects: size (big or small) and flavor (vanilla or chocolate). Across all boxes, the ratio of big to small donuts remains consistent. However, Boxes 1 and 2 share the same vanilla-to-chocolate ratio, which is different from that of Box 3.
Does the correlation between X1 and X2 imply that there is no natural latent? Is this the desired behavior of natural latents, despite the presence of the common size ratio? (and the commonality that I’ve only ever pulled out donuts; there has never been a tennis ball in any of the boxes!)

If so, why is this what we want from natural latents? If not, how does a natural latent arise despite the internal correlation?
- Thane Ruthenis 1 Jan 2024 11:01 UTC
  6 points
  0
  Parent
  My take would be to split each “donut” variable $X_{i}$ into “donut size” $S_{i}$ and “donut flavour” $F_{i}$ . Then there a natural latent for the whole ${S_{i}}$ set of variables, and no natural latent for the whole ${F_{i}}$ set. ${F_{i}}$ basically becomes the “other stuff in the world” $Z$ variable relative to ${S_{i}}$ .
  Granted, there’s an issue in that we can basically do that for any set of variables $X_{i}$ , even entirely unrelated ones: deliberately search for some decomposition of $X_{i}$ into an $S_{i}$ and an $F_{i}$ such that there’s a natural latent for $S_{i}$ . I think some more practical measures could be taken into account here, though, to enure that the abstractions we find are useful. For example, we can check the relative information contents/entropies of ${X_{i}}$ and ${S_{i}}$ , thereby measuring “how much” of the initial variable-set we’re abstracting over. If it’s too little, that’s not a useful abstraction.^[1]
  That passes my common-sense check, at least. It’s essentially how we’re able to decompose and group objects along many different dimensions. We can focus on objects’ geometry (and therefore group all sphere-like objects, from billiard balls to planets to weather balloons) or their material (grouping all objects made out of rock) or their origin (grouping all man-made objects), etc.
  Each grouping then corresponds to an abstraction, with its own generally-applicable properties. E. g., deriving a “sphere” abstraction lets us discover properties like “volume as a function of radius”, and then we can usefully apply that to any spherical object we discover. Similarly, man-made objects tend to have a purpose/function (unlike natural ones), which likewise lets us usefully reason about that whole category in the abstract.
  (Edit: On second thoughts, I think the obvious naive way of doing that just results in ${S_{i}}$ containing all mutual information between $X_{i}$ , with the “abstraction” then just being said mutual information. Which doesn’t seem very useful. I still think there’s something in that direction, but probably not exactly this.)
  1. ^
    Relevant: Finite Factored Sets, which IIRC offer some machinery for these sorts of decompositions of variables.
  - yrimon 2 Jan 2024 10:37 UTC
    5 points
    0
    Parent
    This branch of research is aimed at finding a (nearly) objective way of thinking about the universe. When I imagine the end result, I imagine something that receives a distribution across a bunch of data, and finds a bunch of useful patterns within it. At the moment that looks like finding patterns in data via
    find_natural_latent(get_chunks_of_data(data_distribution))
    or perhaps showing that
    find_top_n(n, (chunks, natural_latent(chunks)) for chunks in
    all_chunked_subsets_of_data(data_distribution),
    key=lambda chunks, latent: usefulness_metric(latent))
    is a (convergent sub)goal of agents. As such, the notion that the donuts’ data is simply poorly chunked—which needs to be solved anyway—makes a lot of sense to me.
    I don’t know how to think about the possibilities when it comes to decomposing $X_{i}$ . Why would it always be possible to decompose random variables to allow for a natural latent? Do you have an easy example of this?
    Also, what do you mean by mutual information between $X_{i}$ , given that there are at least 3 of them? And why would just extracting said mutual information be useless?
    If you get the chance to point me towards good resources about any of these questions, that would be great.
    - johnswentworth 2 Jan 2024 17:40 UTC
      5 points
      3
      Parent
      Regarding chunking: a background assumption for me is that the causal structure of the world yields a natural chunking, with each chunk taking up a little local “voxel” of spacetime.
      Some amount spacetime-induced chunking is “forced upon” an embedded agent, in some sense, since their sensors and actuators are localized in spacetime.
      Now, there’s still degrees of freedom in taking more or less coarse-grained chunkings, and more or less coarse-graining differentially along different spacetime directions or in different places. But I expect that spacetime locality mostly nails down what we need as a starting point for convergent chunking.
    - Thane Ruthenis 2 Jan 2024 11:49 UTC
      3 points
      0
      Parent
      Also, what do you mean by mutual information between $X_{i}$ , given that there are at least 3 of them?
      You can generalize mutual information to N variables: interaction information.
      Why would it always be possible to decompose random variables to allow for a natural latent?
      Well, I suppose I overstated it a bit by saying “always”; you can certainly imagine artificial setups where the mutual information between a bunch of variables is zero. In practice, however, everything in the world is correlated with everything else, so in a real-world setting you’ll likely find such a decomposition always, or almost always.
      And why would just extracting said mutual information be useless?
      Well, not useless as such – it’s a useful formalism – but it would basically skip everything John and David’s post is describing. Crucially, it won’t uniquely determine whether a specific set of objects represents a well-abstracting category.
      The abstraction-finding algorithm should be able to successfully abstract over data if and only if the underlying data actually correspond to some abstraction. If it can abstract over anything, however – any arbitrary bunch of objects – then whatever it is doing, it’s not finding “abstractions”. It may still be useful, but it’s not what we’re looking for here.
      Concrete example: if we feed our algorithm 1000 examples of trees, it should output the “tree” abstraction. If we feed our algorithm 200 examples each of car tires, trees, hydrogen atoms, wallpapers, and continental-philosophy papers, it shouldn’t actually find some abstraction which all of these objects are instances of. But as per the everything-is-correlated argument above, they likely have non-zero mutual information, so the naive “find a decomposition for which there’s a natural latent” algorithm would fail to output nothing.
      More broadly: We’re looking for a “true name” of abstractions, and mutual information is sort-of related, but also clearly not precisely it.