Ok, now the post where I go into my own theory on how to avoid the Dark Room Problem, even without physiological goals.
The brain isn’t just configured to learn any old predictive or causal model of the world. It has to learn the distal causes of its sensory stimuli: the ones that reliably cause the same thing, over and over again, which can be modeled in a tractable way.
If I see a sandwich (which I do right now, it’s lunchtime), one of the important causes is that photons are bouncing off the sandwich, hitting my eyes, and stimulating my retina. However, most photons don’t make me see a sandwich, they make me see other things, and trying to make a model complex enough that exact photon behavior becomes parameters instead of noise is way too complicated.
So instead, I model the cause of my seeing a sandwich as being the sandwich. I see a sandwich because there really is a sandwich.
The useful part about this is that since I’m modeling the consistent, reliable, repeatable causes, these same inferences also support and explain my active interventions. I see a sandwich because there really is a sandwich, and that explains why I can move my hands and mouth to eat the sandwich, and why when I eat the sandwich, I taste a sandwich. Photons don’t really explain any of that without recourse to the sandwich.
However, if I were to reach for the sandwich and find that my hands pass through it, I would have to expand my hypothesis space to include ghost sandwiches or living in a simulation. Some people think the brain can do this with nonparametric models: probabilistic models of infinite stuff, of which I use finite pieces to make predictions. When new data comes in that supports a more complex model, I just expand the finite piece of the infinite object that I’m actually using. The downside is, a nonparametric model will always, irreducibly have a bit of extra uncertainty “left over” when compared to a parametric model that started from the right degree of complexity. The nonparametric has more things to be uncertain about, so it’s always a little more uncertain.
How can these ideas apply to the Dark Room? Well, if I go into a Dark Room, I’m actually sealing myself off from the distal causes of sensations. The walls of the room block out what’s going on outside the room, so I have no idea when, for instance, someone might knock on the door. Really knowing what’s going on requires confidence about the distal causal structure of my environment, not just confidence about the proximal structure of a small local environment. Otherwise, I could always just say, “I’m certain that photons are hitting my eyeballs in some reasonable configuration”, and I’d never need to move or do any inferences at all.
It gets worse! If my model of those distal causes is nonparametric, it always has extra leftover uncertainty. No matter how confident I am about the stuff I’ve seen, I never have complete evidence that I’ve seen everything, that there isn’t an even bigger universe out there I haven’t observed yet.
So really “minimizing prediction error” with respect to a nonparametric model of distal causes ends up requiring that I not only leave my room, but that I explore and control as much of the world as possible, at all scales which ever significantly impact my observations, without limit.
The thing you are minimizing by going outside isn’t prediction error for sense data, it’s a sort of expected prediction error over a spatial extent in your model. I think both of these are valid concepts to think about, so it’s not like this argument shows that prediction error is “really” about building a model of the world and then ensuring that it’s both correct and complete—it’s an argument about what’s more reasonable to model humans as doing.
Of course, once you have two possibilities, that usually means you have infinite possibilities. I see where this could lead to people generating a whole family of formalisms. But I still feel like this route leads to oversimplification.
For example, sometimes people are happy to just fool their sense-data—we take anesthetics, or look at pornography, or drink diet soda. But sometimes people aren’t—the pictures-of-relationships industry is much smaller than the porn industry, people buy free-range beef, or a genuine Rembrandt.
Oh, I wasn’t really trying at all to talk about what prediction-error minimization “really does” there, more to point out that it changes radically depending on your modeling assumptions.
The “distal causes” bit is also something I really want to find the time and expertise to formalize. There are studies of causal judgements grounding moral responsibility of agents and I’d really like to see if we can use the notion of distal causation to generalize from there to how people learn causal models that capture action-affordances.
Ok, now the post where I go into my own theory on how to avoid the Dark Room Problem, even without physiological goals.
The brain isn’t just configured to learn any old predictive or causal model of the world. It has to learn the distal causes of its sensory stimuli: the ones that reliably cause the same thing, over and over again, which can be modeled in a tractable way.
If I see a sandwich (which I do right now, it’s lunchtime), one of the important causes is that photons are bouncing off the sandwich, hitting my eyes, and stimulating my retina. However, most photons don’t make me see a sandwich, they make me see other things, and trying to make a model complex enough that exact photon behavior becomes parameters instead of noise is way too complicated.
So instead, I model the cause of my seeing a sandwich as being the sandwich. I see a sandwich because there really is a sandwich.
The useful part about this is that since I’m modeling the consistent, reliable, repeatable causes, these same inferences also support and explain my active interventions. I see a sandwich because there really is a sandwich, and that explains why I can move my hands and mouth to eat the sandwich, and why when I eat the sandwich, I taste a sandwich. Photons don’t really explain any of that without recourse to the sandwich.
However, if I were to reach for the sandwich and find that my hands pass through it, I would have to expand my hypothesis space to include ghost sandwiches or living in a simulation. Some people think the brain can do this with nonparametric models: probabilistic models of infinite stuff, of which I use finite pieces to make predictions. When new data comes in that supports a more complex model, I just expand the finite piece of the infinite object that I’m actually using. The downside is, a nonparametric model will always, irreducibly have a bit of extra uncertainty “left over” when compared to a parametric model that started from the right degree of complexity. The nonparametric has more things to be uncertain about, so it’s always a little more uncertain.
How can these ideas apply to the Dark Room? Well, if I go into a Dark Room, I’m actually sealing myself off from the distal causes of sensations. The walls of the room block out what’s going on outside the room, so I have no idea when, for instance, someone might knock on the door. Really knowing what’s going on requires confidence about the distal causal structure of my environment, not just confidence about the proximal structure of a small local environment. Otherwise, I could always just say, “I’m certain that photons are hitting my eyeballs in some reasonable configuration”, and I’d never need to move or do any inferences at all.
It gets worse! If my model of those distal causes is nonparametric, it always has extra leftover uncertainty. No matter how confident I am about the stuff I’ve seen, I never have complete evidence that I’ve seen everything, that there isn’t an even bigger universe out there I haven’t observed yet.
So really “minimizing prediction error” with respect to a nonparametric model of distal causes ends up requiring that I not only leave my room, but that I explore and control as much of the world as possible, at all scales which ever significantly impact my observations, without limit.
The thing you are minimizing by going outside isn’t prediction error for sense data, it’s a sort of expected prediction error over a spatial extent in your model. I think both of these are valid concepts to think about, so it’s not like this argument shows that prediction error is “really” about building a model of the world and then ensuring that it’s both correct and complete—it’s an argument about what’s more reasonable to model humans as doing.
Of course, once you have two possibilities, that usually means you have infinite possibilities. I see where this could lead to people generating a whole family of formalisms. But I still feel like this route leads to oversimplification.
For example, sometimes people are happy to just fool their sense-data—we take anesthetics, or look at pornography, or drink diet soda. But sometimes people aren’t—the pictures-of-relationships industry is much smaller than the porn industry, people buy free-range beef, or a genuine Rembrandt.
Oh, I wasn’t really trying at all to talk about what prediction-error minimization “really does” there, more to point out that it changes radically depending on your modeling assumptions.
The “distal causes” bit is also something I really want to find the time and expertise to formalize. There are studies of causal judgements grounding moral responsibility of agents and I’d really like to see if we can use the notion of distal causation to generalize from there to how people learn causal models that capture action-affordances.