Suppose two Bayesian agents are presented with the same spreadsheet—IID samples of data in each row, a feature in each column. Each agent develops a generative model of the data distribution.
It is exceedingly rare that two Bayesian agents are presented with the same data. The more interesting case is when they are presented with different data, or perhaps with partially-overlapping data. Like let’s say you’ve got three spreadsheets, A, B, and AB, and spreadsheets A and AB are concatenated and given to agent X, while spreadsheets B and AB are concatenated and given to agent Y. Obviously agent Y can infer whatever information about A that is present in AB, so the big question is how can X communicate unique information of A to Y, when Y hasn’t even allocated the relevant latents to make use of that information yet, and X doesn’t know what Y has learned from B and thus what is or isn’t redundant?
Haven’t fully read the post, but I feel like that could be relaxed. Part of my intuition is that Aumann’s theorem can be relaxed to the case where the agents start with different priors, and the conclusion is that their posteriors differ by no more than their priors.
The issue with Aumann’s theorem is that if the agents have different data then they might have different structure for the latents they use and so they might lack the language to communicate the value of a particular latent.
Like let’s say you want to explain John Wentworth’s “Minimal Motivation of Natural Latents” post to a cat. You could show the cat the post, but even if it trusted you that the post was important, it doesn’t know how to read or even that reading is a thing you could do with it. It also doesn’t know anything about neural networks, superintelligences, or interpretability/alignment. This would make it hard to make the cat pay attention in any way that differs from any other internet post.
Plausibly a cat lacks the learning ability to ever understand this post (though I don’t think anyone has seriously tried?), but even if you were trying to introduce a human to it, unless that human has a lot of relevant background knowledge, they’re just not going to get it, even when shown the entire text, and it’s going to be hard to explain the gist without a significant back-and-forth to establish the relevant concepts.
Sadly, the difference in their priors could still make a big difference for the natural latents, due to the tiny mixtures problem.
Currently our best way to handle this is to assume a universal prior. That still allows for a wide variety of different priors (i.e. any Turing machine), but the Solomonoff version of natural latents doesn’t have the tiny mixtures problem. For Solomonoff natural latents, we do have the sort of result you’re intuiting, where the divergence (in bits) between the two agents’ priors just gets added to the error term on all the approximations.
Generally the purpose of a simplified model is to highlight:
The essence that drives the dynamics
The key constraints under consideration that obstruct and direct this essence
If the question we want to consider is just “why do there seem to be interpretable features across agents from humans to neural networks to bacteria”, then I think your model is doing fine at highlighting the essence and constraints.
However, if the question we want to consider is more normative about what methods we can build to develop higher interpretations of agents, or to understand which important things might be missed, then I think your model fails to highlight both the essence and the key constraints.
Yeah, I generally try to avoid normative questions. People tend to go funny in the head when they focus on what should be, rather than what is or what can be.
But there are positive versions of the questions you’re asking which I think maintain the core pieces:
Humans do seem to have some “higher interpretations of agents”, as part of our intuitive cognition. How do those work? Will more capable agents likely use similar models, or very different ones?
When and why are some latents or abstractions used by one agent but not another?
By focusing on what is, you get a lot of convex losses on your theories that makes it very easy to converge. This is what prevents people from going funny in the head with that focus.
But the value of what is is long-tailed, so the vast majority of those constraints come from worthless instances of the things in the domain you are considering, and the niches that allow things to grow big are competitive and therefore heterogenous, so this vast majority of constraints don’t help you build the sorts of instances that are valuable. In fact, they might prevent it, if adaptation to a niche leads to breaking some of the constraints in some way.
One attractive compromise is to focus on the best of what there is.
I keep getting stuck on:
It is exceedingly rare that two Bayesian agents are presented with the same data. The more interesting case is when they are presented with different data, or perhaps with partially-overlapping data. Like let’s say you’ve got three spreadsheets, A, B, and AB, and spreadsheets A and AB are concatenated and given to agent X, while spreadsheets B and AB are concatenated and given to agent Y. Obviously agent Y can infer whatever information about A that is present in AB, so the big question is how can X communicate unique information of A to Y, when Y hasn’t even allocated the relevant latents to make use of that information yet, and X doesn’t know what Y has learned from B and thus what is or isn’t redundant?
Haven’t fully read the post, but I feel like that could be relaxed. Part of my intuition is that Aumann’s theorem can be relaxed to the case where the agents start with different priors, and the conclusion is that their posteriors differ by no more than their priors.
The issue with Aumann’s theorem is that if the agents have different data then they might have different structure for the latents they use and so they might lack the language to communicate the value of a particular latent.
Like let’s say you want to explain John Wentworth’s “Minimal Motivation of Natural Latents” post to a cat. You could show the cat the post, but even if it trusted you that the post was important, it doesn’t know how to read or even that reading is a thing you could do with it. It also doesn’t know anything about neural networks, superintelligences, or interpretability/alignment. This would make it hard to make the cat pay attention in any way that differs from any other internet post.
Plausibly a cat lacks the learning ability to ever understand this post (though I don’t think anyone has seriously tried?), but even if you were trying to introduce a human to it, unless that human has a lot of relevant background knowledge, they’re just not going to get it, even when shown the entire text, and it’s going to be hard to explain the gist without a significant back-and-forth to establish the relevant concepts.
Sadly, the difference in their priors could still make a big difference for the natural latents, due to the tiny mixtures problem.
Currently our best way to handle this is to assume a universal prior. That still allows for a wide variety of different priors (i.e. any Turing machine), but the Solomonoff version of natural latents doesn’t have the tiny mixtures problem. For Solomonoff natural latents, we do have the sort of result you’re intuiting, where the divergence (in bits) between the two agents’ priors just gets added to the error term on all the approximations.
Yup, totally agree with this. This particular model/result is definitely toy/oversimplified in that way.
Generally the purpose of a simplified model is to highlight:
The essence that drives the dynamics
The key constraints under consideration that obstruct and direct this essence
If the question we want to consider is just “why do there seem to be interpretable features across agents from humans to neural networks to bacteria”, then I think your model is doing fine at highlighting the essence and constraints.
However, if the question we want to consider is more normative about what methods we can build to develop higher interpretations of agents, or to understand which important things might be missed, then I think your model fails to highlight both the essence and the key constraints.
Yeah, I generally try to avoid normative questions. People tend to go funny in the head when they focus on what should be, rather than what is or what can be.
But there are positive versions of the questions you’re asking which I think maintain the core pieces:
Humans do seem to have some “higher interpretations of agents”, as part of our intuitive cognition. How do those work? Will more capable agents likely use similar models, or very different ones?
When and why are some latents or abstractions used by one agent but not another?
By focusing on what is, you get a lot of convex losses on your theories that makes it very easy to converge. This is what prevents people from going funny in the head with that focus.
But the value of what is is long-tailed, so the vast majority of those constraints come from worthless instances of the things in the domain you are considering, and the niches that allow things to grow big are competitive and therefore heterogenous, so this vast majority of constraints don’t help you build the sorts of instances that are valuable. In fact, they might prevent it, if adaptation to a niche leads to breaking some of the constraints in some way.
One attractive compromise is to focus on the best of what there is.