I find this paper pretty inspirational. I’ve been playing with the intuitions he lays out in the first two sections for days.
It was written in 1996, and I am not altogether sure where the Principle of Maximum Entropy fits in—he uses the phrase ‘maximum entropy’ a lot. It occurs to me the Principle of Maximum Caliber may have a relationship with MaxEnt similar to that between Gibbs and Clausius’ statements of the Second Law of Thermodynamics, but this isn’t clear to me in the main because I know almost nothing about MaxEnt.
I was also reading the reply to Francois Chollet where the improvement of AlphaGo Zero over AlphaGo was being given as an example. In thinking about that in relation to this paper, I have two feelings:
1) I notice not a lot of coverage of what AlphaGo Zero was actually doing during the three day training period, and I really should look that up specifically.
2) What I suspect happened is AlphaGo Zero brute-force mapped the “phase-space” of Go for three days. The possible combinations of piece positions (microstates) are computationally intractable, so I read—AlphaGo Zero went to work on a different level of macrophenomena. So given the rules of Go and current position A, and virtually all I, it confidently predicts the winning end-game positions B.
This makes me think that the real trick to good predictions is making the optimal choice of macrophenomena. In the paper Jaynes consistently highlights nuance related to distinguishing his method from statistical mechanics, which makes sense as that is otherwise how people know him. It seems pretty clear to me that his generalizations liberate us from the traditional associations, which opens up a lot of room for new categories of macrophenomena. For example, consider this from the paper:
On a different plane, we feel that we understand the general thinking and economic motivations of the individual people who are the micro-elements of a society; yet millions of those people combine to make a macroeconomic system whose oscillations and unstable behavior, in defiance of equilibrium theory, leave us bewildered.
But suppose we find humans computationally intractable and the macrophenomena of the economy too imprecise. We could add a middle layer of institutions, like firms and governments, which are also made up of humans. So now we have:
So if it happens that institutions is something you can get a good grip on, no one else will be able to significantly out-predict you about the economy unless they can get a better grip than you on institutions, or they find a new macrophenomena above humans that they can master comparably well and contains more information about the economy than institutions do.
Since we are starting from the perspective of macrophenomena, I keep wanting to say resolution. So if we have our microphenomena on the bottom and the macrophenomena at the top, one strategy might be to look ‘down’ from the macrophenomena and try to identify the lowest-level intermediate-phenomena that can be reasonably computed, and then get a decisive description of that phenomena before returning to predicting the macrophenomena.
Sort of the same way a Fast Fourier Transform works; by cleverly choosing intermediate steps, we can get to the answer we want faster (or in this case, more accurately).
Naive Comments
I find this paper pretty inspirational. I’ve been playing with the intuitions he lays out in the first two sections for days.
It was written in 1996, and I am not altogether sure where the Principle of Maximum Entropy fits in—he uses the phrase ‘maximum entropy’ a lot. It occurs to me the Principle of Maximum Caliber may have a relationship with MaxEnt similar to that between Gibbs and Clausius’ statements of the Second Law of Thermodynamics, but this isn’t clear to me in the main because I know almost nothing about MaxEnt.
I was also reading the reply to Francois Chollet where the improvement of AlphaGo Zero over AlphaGo was being given as an example. In thinking about that in relation to this paper, I have two feelings:
1) I notice not a lot of coverage of what AlphaGo Zero was actually doing during the three day training period, and I really should look that up specifically.
2) What I suspect happened is AlphaGo Zero brute-force mapped the “phase-space” of Go for three days. The possible combinations of piece positions (microstates) are computationally intractable, so I read—AlphaGo Zero went to work on a different level of macrophenomena. So given the rules of Go and current position A, and virtually all I, it confidently predicts the winning end-game positions B.
This makes me think that the real trick to good predictions is making the optimal choice of macrophenomena. In the paper Jaynes consistently highlights nuance related to distinguishing his method from statistical mechanics, which makes sense as that is otherwise how people know him. It seems pretty clear to me that his generalizations liberate us from the traditional associations, which opens up a lot of room for new categories of macrophenomena. For example, consider this from the paper:
So we have:
humans (microphenomena) → economy (macrophenomena)
But suppose we find humans computationally intractable and the macrophenomena of the economy too imprecise. We could add a middle layer of institutions, like firms and governments, which are also made up of humans. So now we have:
humans (microphenomena) → institutions (macrophenomena)
AND
institutions (microphenomena) → economy (macrophenomena)
So if it happens that institutions is something you can get a good grip on, no one else will be able to significantly out-predict you about the economy unless they can get a better grip than you on institutions, or they find a new macrophenomena above humans that they can master comparably well and contains more information about the economy than institutions do.
Since we are starting from the perspective of macrophenomena, I keep wanting to say resolution. So if we have our microphenomena on the bottom and the macrophenomena at the top, one strategy might be to look ‘down’ from the macrophenomena and try to identify the lowest-level intermediate-phenomena that can be reasonably computed, and then get a decisive description of that phenomena before returning to predicting the macrophenomena.
Sort of the same way a Fast Fourier Transform works; by cleverly choosing intermediate steps, we can get to the answer we want faster (or in this case, more accurately).