So the estimate for the number of hard steps doesn’t make sense in the absence of some prior. Starting with a prior distribution for the likelihood of the number of hard steps, and applying bayes rule based on the time passed and remaining, we will update towards more mass on k = t/(T–t) (basically, we go from P( t | k) to P( k | t)).
Ok let’s do this. Since k is an integer, I guess our prior should be a sequence pk. We already know P(t|k)=ktk−1. We can derive from this P(t)=∑kP(t|k)pk, and finally P(k|t)=P(t|k)pkP(t). In our case, t=4.55.5
I guess the most natural prior would be the uniform prior: we fix an integer N, and set pk=1N for k∈[1;N]. From this we can derive the posterior distribution. This is a bit tedious to do by hand, but easy to code. From the posterior distribution we can for example extract the expected value of k : E[k|t]=∑kkP(k|t). I computed it for N∈[1;100] and voilà!
Obviously E[k|t] is strictly increasing. It also converges toward 10. Actually, for almost all values of N, the expected value is very close to 10. To give a criterion, for N=15, the expected value is already above 7.25, which implies that it is closer to 10 than to 4.5.
We can use different types of prior. I also tried pk=e−kN (with a normalization constant), which is basically a smoother version of the previous one. Instead of stating with certainty “the number of hard steps is at most N”, it’s more “the number of hard steps is typically N, but any huge number is possible”. This gives basically the same result, except is separates from 4.5 even faster, as soon as N=13.
My point is not to say that the number of hard steps is 10 in our case. Obviously I cannot know that. Whatever prior we may choose, we will end up with a distribution of probability, not a nice clear answer. My point is that if, for the sake of simplicity, we choose to only remember one number / to only share one number, it should probably not be 4.5 (or k=tT−t), but instead 10 (or k=t+TT−t). I bet that, if you actually have a prior, and actually make the bayesian update, you’ll find the same result.
So again, I wasn’t referring to the expected value of the number of steps, but instead how we should update after learning about the time – that is, I wasn’t talking about E[k|t], but instead P(k|t)/P(k) for various k.
Let’s dig into this. From Bayes, we have: P(k|t)/P(k)=P(t|k)/P(t). As you say, P(t|k) ~ kt^(k-1). We have the pesky P(t) term, but we can note that for any value of t, this will yield a constant, so we can discard it and recognize that now we don’t get a value for the update, but instead just a relative value (we can’t say how large the update is at any individual k, but we can compare the updates for different k). We are now left with P(k|t)/P(k) ~ kt^(k-1), holding t constant. Using the empirical value on Earth of t=(4.5/5.5)=0.82, we get P(k|t=0.82)/P(k) ~ k*0.82^(k-1).
If we graph this, we get:
which apparently has its maximum at 5. That is, whatever the expected value for the number of steps is after considering the time, if we do update on the time, the largest update is in favor of there having been 5 steps. Compared to other plausible numbers for k, the update is weak, though – this partiuclar piece of evidence is a <2x update on there having been 5 steps compared to there having been 2 steps or 10 steps; the relative update for 5 steps is only even ~5x the size of the update for 20 steps.
Considering the general case (where we don’t know t), we can find the maximum of the update by setting the derivative of kt^(k-1) equal to zero. This derivative is (k ln(t) + 1)t^(k-1), and so we need k∗ln(t)=−1, or k=−1/ln(t). If we replace t with x/(x+1), such that x corresponds to the naive number of steps as I was calculating before, then that’s k=−1/ln(x/(x+1)). Here’s what we get if we graph that:
This is almost exactly my original guess (though weirdly, ~all values for k are ~0.5 higher than the corresponding values of x).
Ok let’s do this. Since k is an integer, I guess our prior should be a sequence pk. We already know P(t|k)=ktk−1. We can derive from this P(t)=∑kP(t|k)pk, and finally P(k|t)=P(t|k)pkP(t). In our case, t=4.55.5
I guess the most natural prior would be the uniform prior: we fix an integer N, and set pk=1N for k∈[1;N]. From this we can derive the posterior distribution. This is a bit tedious to do by hand, but easy to code. From the posterior distribution we can for example extract the expected value of k : E[k|t]=∑kkP(k|t). I computed it for N∈[1;100] and voilà!
Obviously E[k|t] is strictly increasing. It also converges toward 10. Actually, for almost all values of N, the expected value is very close to 10. To give a criterion, for N=15, the expected value is already above 7.25, which implies that it is closer to 10 than to 4.5.
We can use different types of prior. I also tried pk=e−kN (with a normalization constant), which is basically a smoother version of the previous one. Instead of stating with certainty “the number of hard steps is at most N”, it’s more “the number of hard steps is typically N, but any huge number is possible”. This gives basically the same result, except is separates from 4.5 even faster, as soon as N=13.
My point is not to say that the number of hard steps is 10 in our case. Obviously I cannot know that. Whatever prior we may choose, we will end up with a distribution of probability, not a nice clear answer. My point is that if, for the sake of simplicity, we choose to only remember one number / to only share one number, it should probably not be 4.5 (or k=tT−t), but instead 10 (or k=t+TT−t). I bet that, if you actually have a prior, and actually make the bayesian update, you’ll find the same result.
So again, I wasn’t referring to the expected value of the number of steps, but instead how we should update after learning about the time – that is, I wasn’t talking about E[k|t], but instead P(k|t)/P(k) for various k.
Let’s dig into this. From Bayes, we have: P(k|t)/P(k)=P(t|k)/P(t). As you say, P(t|k) ~ kt^(k-1). We have the pesky P(t) term, but we can note that for any value of t, this will yield a constant, so we can discard it and recognize that now we don’t get a value for the update, but instead just a relative value (we can’t say how large the update is at any individual k, but we can compare the updates for different k). We are now left with P(k|t)/P(k) ~ kt^(k-1), holding t constant. Using the empirical value on Earth of t=(4.5/5.5)=0.82, we get P(k|t=0.82)/P(k) ~ k*0.82^(k-1).
If we graph this, we get:
which apparently has its maximum at 5. That is, whatever the expected value for the number of steps is after considering the time, if we do update on the time, the largest update is in favor of there having been 5 steps. Compared to other plausible numbers for k, the update is weak, though – this partiuclar piece of evidence is a <2x update on there having been 5 steps compared to there having been 2 steps or 10 steps; the relative update for 5 steps is only even ~5x the size of the update for 20 steps.
Considering the general case (where we don’t know t), we can find the maximum of the update by setting the derivative of kt^(k-1) equal to zero. This derivative is (k ln(t) + 1)t^(k-1), and so we need k∗ln(t)=−1, or k=−1/ln(t). If we replace t with x/(x+1), such that x corresponds to the naive number of steps as I was calculating before, then that’s k=−1/ln(x/(x+1)). Here’s what we get if we graph that:
This is almost exactly my original guess (though weirdly, ~all values for k are ~0.5 higher than the corresponding values of x).
I agree with those computations/results. Thank you