It can’t be less than zero. From what I understand about priors, the maximum entropy prior would be a logarithmic prior. A more reasonable prior would be a log-normal prior with the mean on 1 and a high standard deviation
By logarithmic do you mean p(x) = exp(-x)? That would only have an entropy of 1, I believe, whereas one can easily obtain unboundedly large amounts of entropy, or even infinite entropy (for instance, p(x) = a exp(-a x) has entropy 1-log(a), so letting a go to zero yields arbitrarily large entropy).
Also, as I’ve noted before, entropy doesn’t make that much sense for continuous distributions.
I think it’s Jeffreys prior or something. Anyway, it seems like a good prior. It doesn’t have any arbitrary constants in it like you’d need with p(x) = exp(-x). If you change the scale, the prior stays the same.
p(x) = 1/x isn’t an integrable function (diverges at both 0 and infinity).
(My real objection is more that it’s pretty unlikely that we really have so little information that we have to quibble about which prior to use. It’s also good to be aware of the mathematical difficulties inherent in trying to be an “objective Bayesian”, but the real problem is that it’s not very helpful for making more accurate empirical predictions.)
Which is why I said a log-normal prior would be more reasonable.
Why a log-normal prior with mu = 0? Why not some other value for the location parameter? Log-normal makes pretty strong assumptions, which aren’t justified if we for all practical purposes we have no information about the feedback constant.
How much information do we have? We know that we haven’t managed to build an AI in 40 years, and that’s about it.
We may have little specific information about AIs, but we have tons of information about feedback laws, and some information about self-improving systems in general*. I agree that it can be tricky to convert this information to a probability, but that just seems to be an argument against using probabilities in general. Whatever makes it hard to arrive at a good posterior should also make it hard to arrive at a good prior.
(I’m being slightly vague here for the purpose of exposition. I can make these statements more precise if you prefer.)
(* See for instance the Yudkowsky-Hanson AI Foom Debate.)
It can’t be less than zero. From what I understand about priors, the maximum entropy prior would be a logarithmic prior. A more reasonable prior would be a log-normal prior with the mean on 1 and a high standard deviation
By logarithmic do you mean p(x) = exp(-x)? That would only have an entropy of 1, I believe, whereas one can easily obtain unboundedly large amounts of entropy, or even infinite entropy (for instance, p(x) = a exp(-a x) has entropy 1-log(a), so letting a go to zero yields arbitrarily large entropy).
Also, as I’ve noted before, entropy doesn’t make that much sense for continuous distributions.
I mean p(x) = 1/x
I think it’s Jeffreys prior or something. Anyway, it seems like a good prior. It doesn’t have any arbitrary constants in it like you’d need with p(x) = exp(-x). If you change the scale, the prior stays the same.
p(x) = 1/x isn’t an integrable function (diverges at both 0 and infinity).
(My real objection is more that it’s pretty unlikely that we really have so little information that we have to quibble about which prior to use. It’s also good to be aware of the mathematical difficulties inherent in trying to be an “objective Bayesian”, but the real problem is that it’s not very helpful for making more accurate empirical predictions.)
Which is why I said a log-normal prior would be more reasonable.
How much information do we have? We know that we haven’t managed to build an AI in 40 years, and that’s about it.
We probably have enough information if we can process it right, but because we don’t know how, we’re best off sticking close to the prior.
Why a log-normal prior with mu = 0? Why not some other value for the location parameter? Log-normal makes pretty strong assumptions, which aren’t justified if we for all practical purposes we have no information about the feedback constant.
We may have little specific information about AIs, but we have tons of information about feedback laws, and some information about self-improving systems in general*. I agree that it can be tricky to convert this information to a probability, but that just seems to be an argument against using probabilities in general. Whatever makes it hard to arrive at a good posterior should also make it hard to arrive at a good prior.
(I’m being slightly vague here for the purpose of exposition. I can make these statements more precise if you prefer.)
(* See for instance the Yudkowsky-Hanson AI Foom Debate.)