I normally don’t think of most functions as polynomials at all—in fact, I think of most real-world functions as going to zero for large values. E.g. the function “dogness” vs. “nose size” cannot be any polynomial, because polynomials (or their inverses) blow up unrealistically for large (or small) nose sizes.
I guess the hope is that you always learn even polynomials, oriented in such a way that the extremes appear unappealing?
I believe the paper says that log densities are (approximately) polynomial—e.g. a Gaussian would satisfy this, since the log density of a Gaussian is quadratic.
What John said. To elaborate, it’s specifically talking about the case where there is some concept from which some probabilistic generative model creates observations tied to the concept, and claiming that the log probabilities follow a polynomial.
Suppose the most dog-like nose size is K. One function you could use is y = exp(-(x—K)^d) for some positive integer d. That’s a function whose maximum value is 0 (where higher values = more “dogness”) and doesn’t blow up unreasonably anywhere.
(Really you should be talking about probabilities, in which case you use the same sort of function but then normalize, which transforms the exp into a softmax, as the paper suggests)
I normally don’t think of most functions as polynomials at all—in fact, I think of most real-world functions as going to zero for large values. E.g. the function “dogness” vs. “nose size” cannot be any polynomial, because polynomials (or their inverses) blow up unrealistically for large (or small) nose sizes.
I guess the hope is that you always learn even polynomials, oriented in such a way that the extremes appear unappealing?
I believe the paper says that log densities are (approximately) polynomial—e.g. a Gaussian would satisfy this, since the log density of a Gaussian is quadratic.
What John said. To elaborate, it’s specifically talking about the case where there is some concept from which some probabilistic generative model creates observations tied to the concept, and claiming that the log probabilities follow a polynomial.
Suppose the most dog-like nose size is K. One function you could use is y = exp(-(x—K)^d) for some positive integer d. That’s a function whose maximum value is 0 (where higher values = more “dogness”) and doesn’t blow up unreasonably anywhere.
(Really you should be talking about probabilities, in which case you use the same sort of function but then normalize, which transforms the exp into a softmax, as the paper suggests)