Thomas Kwa comments on Minimal Motivation of Natural Latents

Thomas Kwa 16 Oct 2024 15:31 UTC
4 points
0
Haven’t fully read the post, but I feel like that could be relaxed. Part of my intuition is that Aumann’s theorem can be relaxed to the case where the agents start with different priors, and the conclusion is that their posteriors differ by no more than their priors.
- tailcalled 16 Oct 2024 15:59 UTC
  3 points
  0
  Parent
  The issue with Aumann’s theorem is that if the agents have different data then they might have different structure for the latents they use and so they might lack the language to communicate the value of a particular latent.
  Like let’s say you want to explain John Wentworth’s “Minimal Motivation of Natural Latents” post to a cat. You could show the cat the post, but even if it trusted you that the post was important, it doesn’t know how to read or even that reading is a thing you could do with it. It also doesn’t know anything about neural networks, superintelligences, or interpretability/alignment. This would make it hard to make the cat pay attention in any way that differs from any other internet post.
  Plausibly a cat lacks the learning ability to ever understand this post (though I don’t think anyone has seriously tried?), but even if you were trying to introduce a human to it, unless that human has a lot of relevant background knowledge, they’re just not going to get it, even when shown the entire text, and it’s going to be hard to explain the gist without a significant back-and-forth to establish the relevant concepts.
- johnswentworth 16 Oct 2024 15:46 UTC
  2 points
  0
  Parent
  Sadly, the difference in their priors could still make a big difference for the natural latents, due to the tiny mixtures problem.
  Currently our best way to handle this is to assume a universal prior. That still allows for a wide variety of different priors (i.e. any Turing machine), but the Solomonoff version of natural latents doesn’t have the tiny mixtures problem. For Solomonoff natural latents, we do have the sort of result you’re intuiting, where the divergence (in bits) between the two agents’ priors just gets added to the error term on all the approximations.