Wow, I studied your work in grad school! (And more recently your paper on Gaussian Processes). Quite an honor to get a comment from you. Just as an aside, I am not sure if my figure is visible, can you see it? I set it as the thumbnail, but I don’t see it anywhere. In case it doesn’t, it is here:
I think I need to change some labels, I realize now that I have been using ‘x’ ambiguously—sometimes as a model input, and sometimes to represent the bedrock physical system. But, to clarify, I’ll use your vision example, but add temporality:
zt: position, orientation and velocity of some object at time t
xt: pixel values from a video camera
ϕt: physical state (particles, forces) of the relevant slice of space at time t (includes the object and photons emanating from it which have hit the camera)
Your presentation would be clearer if you started with one or more examples of what you see as typical
models, in which you argue that z isn’t usefully seen as causing x.
Actually your example of a typical vision model is one example where I’d argue this, though I fear you might think this is a trivial splitting of hairs.
In any case, I’ll first assume you agree that “causation” by any use of the term, must be temporal—causes must come before effects. So, to a modified question: why wouldn’t zt−δ be usefully seen as causing xt? In this case, let’s assume then that delta is the time required for photons to travel from the object’s surface to the camera.
What I’m more saying is that zt, or zt−δ are platonic, non-physical quantities. It is ϕt−δ which is causing ϕt, and xt is just a slice of ϕt. Or, if you like, xt could be seen as a platonic abstraction of it.
I would also add though that at best, zt−δ could at best be interpreted as an aspect of something that caused the pixels x. Of course, just position, orientation and velocity of an object aren’t enough to determine colors of all pixels.
This vision example is one example in which the zt representations are very much rigid-body summaries, and so it seems useful to strongly identify them as “causes”. But, I am trying to put all of ML and mental models on the same semantic footing here. There are plenty of models, like for instance diffusion models, where the z are just pure noise, where such an interpretation makes no sense at all, even in the sense that you mean.
Hi Dr. Neal,
Wow, I studied your work in grad school! (And more recently your paper on Gaussian Processes). Quite an honor to get a comment from you. Just as an aside, I am not sure if my figure is visible, can you see it? I set it as the thumbnail, but I don’t see it anywhere. In case it doesn’t, it is here:
https://www.mlcrumbs.com/img/epistemology-of-representation.png
I think I need to change some labels, I realize now that I have been using ‘x’ ambiguously—sometimes as a model input, and sometimes to represent the bedrock physical system. But, to clarify, I’ll use your vision example, but add temporality:
zt: position, orientation and velocity of some object at time t
xt: pixel values from a video camera
ϕt: physical state (particles, forces) of the relevant slice of space at time t (includes the object and photons emanating from it which have hit the camera)
Actually your example of a typical vision model is one example where I’d argue this, though I fear you might think this is a trivial splitting of hairs.
In any case, I’ll first assume you agree that “causation” by any use of the term, must be temporal—causes must come before effects. So, to a modified question: why wouldn’t zt−δ be usefully seen as causing xt? In this case, let’s assume then that delta is the time required for photons to travel from the object’s surface to the camera.
What I’m more saying is that zt, or zt−δ are platonic, non-physical quantities. It is ϕt−δ which is causing ϕt, and xt is just a slice of ϕt. Or, if you like, xt could be seen as a platonic abstraction of it.
I would also add though that at best, zt−δ could at best be interpreted as an aspect of something that caused the pixels x. Of course, just position, orientation and velocity of an object aren’t enough to determine colors of all pixels.
This vision example is one example in which the zt representations are very much rigid-body summaries, and so it seems useful to strongly identify them as “causes”. But, I am trying to put all of ML and mental models on the same semantic footing here. There are plenty of models, like for instance diffusion models, where the z are just pure noise, where such an interpretation makes no sense at all, even in the sense that you mean.