Does anyone know if Shannon arrive at entropy from the axiomatic definition first, or the operational definition first?
I’ve been thinking about these two distinct ways in which we seem to arrive at new mathematical concepts, and looking at the countless partial information decomposition measures in the literature all derived/motivated based on an axiomatic basis, and not knowing which intuition to prioritize over which, I’ve been assigning less premium on axiomatic conceptual definitions than i used to:
decision theoretic justification of probability > Cox’s theorem
shannon entropy as min description length > three information axioms
The basis of comparison would be its usefulness and ease-of-generalization to better concepts:
at least in the case of fernando’s synergistic information, it seems far more useful because i at least know what i’m exactly getting out of it, unlike having to compare between the axiomatic definitions based on handwavy judgements.
for ease of generalization, the problem with axiomatic definitions is that there are many logically equivalent ways to state the initial axiom (from which they can then be relaxed), and operational motivations seem to ground these equivalent characterizations better, like logical inductors from the decision theoretic view of probability theory
I’m not sure what you mean by operational vs axiomatic definitions.
But Shannon was unaware of the usage of S=−Σipilnpi in statistical mechanics. Instead, he was inspired by Nyquist and Hartley’s work, which introduced ad-hoc definitions of information in the case of constant probability distributions.
And in his seminal paper, “A mathematical theory of communication”, he argued in the introduction for the logarithm as a measure of information because of practicality, intuition and mathematical convenience. Moreover, he explicitly derived the entropy of a distribution from three axioms: 1) that it be continuous wrt. the probabilities, 2) that it increase monotonically for larger systems w/ constant probability distributions, 3) and that it be a weighted sum the entropy of sub-systems. See section 6 for more details.
Does anyone know if Shannon arrive at entropy from the axiomatic definition first, or the operational definition first?
I’ve been thinking about these two distinct ways in which we seem to arrive at new mathematical concepts, and looking at the countless partial information decomposition measures in the literature all derived/motivated based on an axiomatic basis, and not knowing which intuition to prioritize over which, I’ve been assigning less premium on axiomatic conceptual definitions than i used to:
decision theoretic justification of probability > Cox’s theorem
shannon entropy as min description length > three information axioms
fernando’s operational definition of synergistic information > rest of the literature with its countless non-operational PID measures
The basis of comparison would be its usefulness and ease-of-generalization to better concepts:
at least in the case of fernando’s synergistic information, it seems far more useful because i at least know what i’m exactly getting out of it, unlike having to compare between the axiomatic definitions based on handwavy judgements.
for ease of generalization, the problem with axiomatic definitions is that there are many logically equivalent ways to state the initial axiom (from which they can then be relaxed), and operational motivations seem to ground these equivalent characterizations better, like logical inductors from the decision theoretic view of probability theory
(obviously these two feed into each other)
I’m not sure what you mean by operational vs axiomatic definitions.
But Shannon was unaware of the usage of S=−Σi pi ln pi in statistical mechanics. Instead, he was inspired by Nyquist and Hartley’s work, which introduced ad-hoc definitions of information in the case of constant probability distributions.
And in his seminal paper, “A mathematical theory of communication”, he argued in the introduction for the logarithm as a measure of information because of practicality, intuition and mathematical convenience. Moreover, he explicitly derived the entropy of a distribution from three axioms:
1) that it be continuous wrt. the probabilities,
2) that it increase monotonically for larger systems w/ constant probability distributions,
3) and that it be a weighted sum the entropy of sub-systems.
See section 6 for more details.
I hope that answers your question.