Generally the purpose of a simplified model is to highlight:
The essence that drives the dynamics
The key constraints under consideration that obstruct and direct this essence
If the question we want to consider is just “why do there seem to be interpretable features across agents from humans to neural networks to bacteria”, then I think your model is doing fine at highlighting the essence and constraints.
However, if the question we want to consider is more normative about what methods we can build to develop higher interpretations of agents, or to understand which important things might be missed, then I think your model fails to highlight both the essence and the key constraints.
Yeah, I generally try to avoid normative questions. People tend to go funny in the head when they focus on what should be, rather than what is or what can be.
But there are positive versions of the questions you’re asking which I think maintain the core pieces:
Humans do seem to have some “higher interpretations of agents”, as part of our intuitive cognition. How do those work? Will more capable agents likely use similar models, or very different ones?
When and why are some latents or abstractions used by one agent but not another?
By focusing on what is, you get a lot of convex losses on your theories that makes it very easy to converge. This is what prevents people from going funny in the head with that focus.
But the value of what is is long-tailed, so the vast majority of those constraints come from worthless instances of the things in the domain you are considering, and the niches that allow things to grow big are competitive and therefore heterogenous, so this vast majority of constraints don’t help you build the sorts of instances that are valuable. In fact, they might prevent it, if adaptation to a niche leads to breaking some of the constraints in some way.
One attractive compromise is to focus on the best of what there is.
Yup, totally agree with this. This particular model/result is definitely toy/oversimplified in that way.
Generally the purpose of a simplified model is to highlight:
The essence that drives the dynamics
The key constraints under consideration that obstruct and direct this essence
If the question we want to consider is just “why do there seem to be interpretable features across agents from humans to neural networks to bacteria”, then I think your model is doing fine at highlighting the essence and constraints.
However, if the question we want to consider is more normative about what methods we can build to develop higher interpretations of agents, or to understand which important things might be missed, then I think your model fails to highlight both the essence and the key constraints.
Yeah, I generally try to avoid normative questions. People tend to go funny in the head when they focus on what should be, rather than what is or what can be.
But there are positive versions of the questions you’re asking which I think maintain the core pieces:
Humans do seem to have some “higher interpretations of agents”, as part of our intuitive cognition. How do those work? Will more capable agents likely use similar models, or very different ones?
When and why are some latents or abstractions used by one agent but not another?
By focusing on what is, you get a lot of convex losses on your theories that makes it very easy to converge. This is what prevents people from going funny in the head with that focus.
But the value of what is is long-tailed, so the vast majority of those constraints come from worthless instances of the things in the domain you are considering, and the niches that allow things to grow big are competitive and therefore heterogenous, so this vast majority of constraints don’t help you build the sorts of instances that are valuable. In fact, they might prevent it, if adaptation to a niche leads to breaking some of the constraints in some way.
One attractive compromise is to focus on the best of what there is.