(I’m taking the tack that “you might be wrong” isn’t just already accounted for in your distributions, and you’re now considering a generic update on “you might be wrong”.)
so you’re more confident about your AGI beliefs, and OpenPhil is less confident. Therefore, to the extent that you might be wrong, the world is going to look more like OpenPhil’s forecasts of how the future will probably look
Informally, this is simply wrong: the specificity in OpenPhil’s forecasts is some other specificity added to some hypothetical max-entropy distribution, and it can be a totally different sort of specificity than yours (rather than simply a less confident version of yours).
Formally: It’s true that if you have a distribution P, and then update on “I might be wrong about the stuff that generated this distribution” to the distribution P’, then P’ should be higher entropy than P; so P’ be more similar in that it’s higher entropy to other distributions Q with higher entropy than P. That doesn’t mean P’ will be more similar than P in terms of what it says will happen, to some other higher entropy distribution Q. You could increase the entropy of P by spreading its mass over more outcomes that Q thinks are impossible; this would make P’ further from Q than P is from Q, on natural measures of distance, e.g. KL-divergence (quasi-metric) or earth-mover or whatever. (For the other direction of KL divergence, you could have P reallocate mass away from areas Q thinks are likely; this would be natural if P and Q semi-agreed on a likely outcome, so that P’ is more agnostic and has higher expected surprise according to Q. We can simultaneously have KL(P,Q) < KL(P’,Q) and KL(Q,P) < KL(Q,P’).)
(Also I think for basically any random variable X we can have |E_P(X) - E_Q(X)| < |E_P’(X) - E_Q(X)| for all degrees of wrongness giving P’ from P.)
If you put higher probabilities on AGI arriving in the years before 2050, then, on average, you’re concentrating more probability into each year that AGI might possibly arrive, than OpenPhil does.
This is true for years before 2050, but not necessarily for years after 2050, if your distribution e.g. has a thick tail and OpenPhil has a thin tail. It’s true for all years if both of your distributions are just constant probabilities in each year, and maybe for some other similar kinds of families.
Your probability distribution has lower entropy [than OpenPhil’s].
Not true in general, by the above. (It’s true that Eliezer’s distribution for “AGI before OpenPhil’s median, yea/nay?” has lower entropy than OpenPhil’s, but that would be true for any two distributions with different medians!)
So to the extent that you’re wrong, it should shift your probability distributions in the direction of maximum entropy.
This seems right. (Which might be away from OpenPhil’s distribution.) The update from P to P’ looks like mixing in some less-specific prior. It’s hard to say what it should be; it’s supposed to be maximum-entropy given some background information, but IDK what the right way is to put a maximum entropy distribution on the space of years (for one thing, it’s non-compact; for another, the indistinguishability of years that could give a uniform distribution or a Poisson distribution seems pretty dubious, and I’m not sure what to do if there’s not a clear symmetry to fall back on). So I’m not even sure that the median should go up!
Is Humbali right that generic uncertainty about maybe being wrong, without other extra premises, should increase the entropy of one’s probability distribution over AGI,
Yes.
(I mean, if what you think you’re maybe wrong about, is specifically some arguments that previously updated you to be less confident than some so-called “maximum”-entropy distribution, then you’d decrease your entropy when you put more weight on being wrong. This isn’t generic wrongness, since it’s failing to doubt the assumptions that went into the “maximum”-entropy distribution, which apparently you can coherently doubt, since previously some arguments left you to fall back on some other higher-entropy distribution based on weaker assumptions. But I guess it could look like you were supposed to fall back on the lower-entropy distribution, if that felt like the “background”.)
thereby moving out its median further away in time?
Not necessarily. It depends what the max-entropy distribution looks like, i.e. what assumptions you’re falling back on if you’re wrong.
Now having read the rest of the essay… I guess “maximum entropy” is just straight up confusing if you don’t insert the ”...given assumptions XYZ”. Otherwise it sounds like there’s such a thing as “the maximum-entropy distribution”, which doesn’t exist: you have to cut up the possible worlds somehow, and different ways of cutting them up produces different uniform distributions. (Or in the continuous case, you have to choose a measure in order to do integration, and that measure contains just as much information as a probability distribution; the uniform measure says that all years are the same, but you could also say all orders of magnitude of time since the Big Bang are the same, or something else.) So how you cut up possible worlds changes the uniform distribution, i.e. the maximum entropy distribution. So the assumptions that go into how you cut up the worlds, are determining your maximum entropy distribution.
Hold on, I guess this actually means that for a natural interpretation of “entropy” in “generic uncertainty about maybe being wrong, without other extra premises, should increase the entropy of one’s probability distribution over AGI,” that statement is actually false. If by “entropy” we mean “entropy according to the uniform measure”, it’s false. What we should really mean is entropy according to one’s maximum entropy distribution (as the background measure), in which case the statement is true.
(I’m taking the tack that “you might be wrong” isn’t just already accounted for in your distributions, and you’re now considering a generic update on “you might be wrong”.)
Informally, this is simply wrong: the specificity in OpenPhil’s forecasts is some other specificity added to some hypothetical max-entropy distribution, and it can be a totally different sort of specificity than yours (rather than simply a less confident version of yours).
Formally: It’s true that if you have a distribution P, and then update on “I might be wrong about the stuff that generated this distribution” to the distribution P’, then P’ should be higher entropy than P; so P’ be more similar in that it’s higher entropy to other distributions Q with higher entropy than P. That doesn’t mean P’ will be more similar than P in terms of what it says will happen, to some other higher entropy distribution Q. You could increase the entropy of P by spreading its mass over more outcomes that Q thinks are impossible; this would make P’ further from Q than P is from Q, on natural measures of distance, e.g. KL-divergence (quasi-metric) or earth-mover or whatever. (For the other direction of KL divergence, you could have P reallocate mass away from areas Q thinks are likely; this would be natural if P and Q semi-agreed on a likely outcome, so that P’ is more agnostic and has higher expected surprise according to Q. We can simultaneously have KL(P,Q) < KL(P’,Q) and KL(Q,P) < KL(Q,P’).)
(Also I think for basically any random variable X we can have |E_P(X) - E_Q(X)| < |E_P’(X) - E_Q(X)| for all degrees of wrongness giving P’ from P.)
This is true for years before 2050, but not necessarily for years after 2050, if your distribution e.g. has a thick tail and OpenPhil has a thin tail. It’s true for all years if both of your distributions are just constant probabilities in each year, and maybe for some other similar kinds of families.
Not true in general, by the above. (It’s true that Eliezer’s distribution for “AGI before OpenPhil’s median, yea/nay?” has lower entropy than OpenPhil’s, but that would be true for any two distributions with different medians!)
This seems right. (Which might be away from OpenPhil’s distribution.) The update from P to P’ looks like mixing in some less-specific prior. It’s hard to say what it should be; it’s supposed to be maximum-entropy given some background information, but IDK what the right way is to put a maximum entropy distribution on the space of years (for one thing, it’s non-compact; for another, the indistinguishability of years that could give a uniform distribution or a Poisson distribution seems pretty dubious, and I’m not sure what to do if there’s not a clear symmetry to fall back on). So I’m not even sure that the median should go up!
Yes.
(I mean, if what you think you’re maybe wrong about, is specifically some arguments that previously updated you to be less confident than some so-called “maximum”-entropy distribution, then you’d decrease your entropy when you put more weight on being wrong. This isn’t generic wrongness, since it’s failing to doubt the assumptions that went into the “maximum”-entropy distribution, which apparently you can coherently doubt, since previously some arguments left you to fall back on some other higher-entropy distribution based on weaker assumptions. But I guess it could look like you were supposed to fall back on the lower-entropy distribution, if that felt like the “background”.)
Not necessarily. It depends what the max-entropy distribution looks like, i.e. what assumptions you’re falling back on if you’re wrong.
Now having read the rest of the essay… I guess “maximum entropy” is just straight up confusing if you don’t insert the ”...given assumptions XYZ”. Otherwise it sounds like there’s such a thing as “the maximum-entropy distribution”, which doesn’t exist: you have to cut up the possible worlds somehow, and different ways of cutting them up produces different uniform distributions. (Or in the continuous case, you have to choose a measure in order to do integration, and that measure contains just as much information as a probability distribution; the uniform measure says that all years are the same, but you could also say all orders of magnitude of time since the Big Bang are the same, or something else.) So how you cut up possible worlds changes the uniform distribution, i.e. the maximum entropy distribution. So the assumptions that go into how you cut up the worlds, are determining your maximum entropy distribution.
Hold on, I guess this actually means that for a natural interpretation of “entropy” in “generic uncertainty about maybe being wrong, without other extra premises, should increase the entropy of one’s probability distribution over AGI,” that statement is actually false. If by “entropy” we mean “entropy according to the uniform measure”, it’s false. What we should really mean is entropy according to one’s maximum entropy distribution (as the background measure), in which case the statement is true.