I’m not sure what you mean by operational vs axiomatic definitions.
But Shannon was unaware of the usage of S=−Σipilnpi in statistical mechanics. Instead, he was inspired by Nyquist and Hartley’s work, which introduced ad-hoc definitions of information in the case of constant probability distributions.
And in his seminal paper, “A mathematical theory of communication”, he argued in the introduction for the logarithm as a measure of information because of practicality, intuition and mathematical convenience. Moreover, he explicitly derived the entropy of a distribution from three axioms: 1) that it be continuous wrt. the probabilities, 2) that it increase monotonically for larger systems w/ constant probability distributions, 3) and that it be a weighted sum the entropy of sub-systems. See section 6 for more details.
I’m not sure what you mean by operational vs axiomatic definitions.
But Shannon was unaware of the usage of S=−Σi pi ln pi in statistical mechanics. Instead, he was inspired by Nyquist and Hartley’s work, which introduced ad-hoc definitions of information in the case of constant probability distributions.
And in his seminal paper, “A mathematical theory of communication”, he argued in the introduction for the logarithm as a measure of information because of practicality, intuition and mathematical convenience. Moreover, he explicitly derived the entropy of a distribution from three axioms:
1) that it be continuous wrt. the probabilities,
2) that it increase monotonically for larger systems w/ constant probability distributions,
3) and that it be a weighted sum the entropy of sub-systems.
See section 6 for more details.
I hope that answers your question.