I feel like you are barking up the wrong tree. You are modelling the expected time needed for a sequence of steps, given the number of hard steps in the sequence, and given that all steps will be done before time T. I agree with you that those results are surprising, especially the fact that the expected time required for hard steps is almost independent from how hard those steps are, but I don’t think this is the kind of question that comes to mind on this topic. The questions would be closer to
How many hard steps has life on earth already achieved, and how hard were they?
How many hard steps remain in front of us, and how hard are they?
How long will it take us to achieve them / how likely is it that we will achieve them?
I may have misunderstood your point, if so, feel free to correct me.
Having a model for the dynamics at play is valuable for making progress on further questions. For instance, knowing that the expected hard-step time is ~identical to the expected remaining time gives us reason to expect that the number of hard steps passed on Earth already is perhaps ~4.5 (given that the remaining time in Earth’s habitability window appears to be ~1 billion years). Admittedly, this is a weak update, and there are caveats here, but it’s not nothing.
Additionally, the fact that the expected time for hard steps is ~independent of the difficulty of those steps tells us that, among other things, the fact that abiogenesis was early in Earth’s history (perhaps ~400 MY after Earth formed) is not good evidence that abiogenesis is easy, as this is arguably around what we’d expect from the hard-step model (astrobiologists have previously argued that early abiogenesis on Earth is strong evidence for life being easy and thus common in the universe, and that if it was hard/rare that we’d expect abiogenesis to have occurred around halfway through Earth’s lifetime).
Of course, this model doesn’t help solve the latter two questions you raised, as it doesn’t touch on future steps. A separate line of research that made progress on those questions (or a different angle of attack on the questions that this model can address) would also be a valuable contribution to the field.
For instance, knowing that the expected hard-step time is ~identical to the expected remaining time gives us reason to expect that the number of hard steps passed on Earth already is perhaps ~4.5 (given that the remaining time in Earth’s habitability window appears to be ~1 billion years).
I actually disagree with this statement. Assuming from now on that T=1, so that we can normalize everything, your post shows that, if there are k hard steps, given that they are all achieved, then the expected time required to achieve all of them is t=kk+1. Thus, you have built an estimator of t, given k: ^t(k)=kk+1.
Now you want to solve the reverse problem: there is k∗ hard steps, and you want to estimate this quantity. We have one piece of information, which is t. Therefore, the problem is, given t, to build an estimator ^k(t). This is not the same problem. You propose to use ^k(t)=t1−t. The intuition, I assume, is that this is the inverse function of the previous estimator.
The first thing we could expect from this estimator is to have the correct expected value, i.e. E[^k(t)]=k∗. Let’s check that.
The density of t is ft(s)=k∗sk∗−1 (quick sanity check here : the expected value of t is indeed k∗k∗+1). From this we can derive the expected value E[^k(t)]=∫10s1−sk∗sk∗−1ds. And we conclude that E[^k(t)]=∞
How did this happen? Well, it happened because it is likely for t to be really close to 1, which makes ^k(t) explode.
Ok, so the expected value doesn’t match anything. But maybe k∗ is the most likely result, which would already be great. Let’s check that too.
The density of ^k(s) is f^k^k(s)=1|^k′(s)|ft(s). Since ^k(s)=s1−s, we have ^k′(s)=1(1−s)2, therefore f^k^k(s)=(1−s)2k∗sk∗−1. We can derive this to get the argmax :¯s=k∗−1k∗+1, and therefore ¯k=¯s1−¯s=k∗−12. Surprisingly enough, the most likely result is not k∗
Just to be clear about what this means, if there are infinitely many planets in the universe on which live sentient species as advanced as us, and on each of them a smart individual is using your estimator ^k(t)=t1−t to guess how many hard step they already went through, then the average of these estimates is infinite, and the most common result is k∗−12.
Fixing the estimator
An easy thing is to find the maximum likelihood estimator. Just use ^k(t)=1+t1−t and, if you check the math again, you will see that the most likely result is ^k(t)=k∗. Matching the expected value is a bit more difficult, because as soon as your estimator looks like h(t)1−t, the expected value will be infinite.
For us, t=4.55.5, therefore the maximum likelihood estimator gives us ^k(t)=1+t1−t=10. Therefore, 10 hard steps seems a more reasonable estimate than 4.5
The intuition, I assume, is that this is the inverse function of the previous estimator.
So the estimate for the number of hard steps doesn’t make sense in the absence of some prior. Starting with a prior distribution for the likelihood of the number of hard steps, and applying bayes rule based on the time passed and remaining, we will update towards more mass on k = t/(T–t) (basically, we go from P( t | k) to P( k | t)).
By “gives us reason to expect” I didn’t mean “this will be the expected value”, but instead “we should update in this direction”.
So the estimate for the number of hard steps doesn’t make sense in the absence of some prior. Starting with a prior distribution for the likelihood of the number of hard steps, and applying bayes rule based on the time passed and remaining, we will update towards more mass on k = t/(T–t) (basically, we go from P( t | k) to P( k | t)).
Ok let’s do this. Since k is an integer, I guess our prior should be a sequence pk. We already know P(t|k)=ktk−1. We can derive from this P(t)=∑kP(t|k)pk, and finally P(k|t)=P(t|k)pkP(t). In our case, t=4.55.5
I guess the most natural prior would be the uniform prior: we fix an integer N, and set pk=1N for k∈[1;N]. From this we can derive the posterior distribution. This is a bit tedious to do by hand, but easy to code. From the posterior distribution we can for example extract the expected value of k : E[k|t]=∑kkP(k|t). I computed it for N∈[1;100] and voilà!
Obviously E[k|t] is strictly increasing. It also converges toward 10. Actually, for almost all values of N, the expected value is very close to 10. To give a criterion, for N=15, the expected value is already above 7.25, which implies that it is closer to 10 than to 4.5.
We can use different types of prior. I also tried pk=e−kN (with a normalization constant), which is basically a smoother version of the previous one. Instead of stating with certainty “the number of hard steps is at most N”, it’s more “the number of hard steps is typically N, but any huge number is possible”. This gives basically the same result, except is separates from 4.5 even faster, as soon as N=13.
My point is not to say that the number of hard steps is 10 in our case. Obviously I cannot know that. Whatever prior we may choose, we will end up with a distribution of probability, not a nice clear answer. My point is that if, for the sake of simplicity, we choose to only remember one number / to only share one number, it should probably not be 4.5 (or k=tT−t), but instead 10 (or k=t+TT−t). I bet that, if you actually have a prior, and actually make the bayesian update, you’ll find the same result.
So again, I wasn’t referring to the expected value of the number of steps, but instead how we should update after learning about the time – that is, I wasn’t talking about E[k|t], but instead P(k|t)/P(k) for various k.
Let’s dig into this. From Bayes, we have: P(k|t)/P(k)=P(t|k)/P(t). As you say, P(t|k) ~ kt^(k-1). We have the pesky P(t) term, but we can note that for any value of t, this will yield a constant, so we can discard it and recognize that now we don’t get a value for the update, but instead just a relative value (we can’t say how large the update is at any individual k, but we can compare the updates for different k). We are now left with P(k|t)/P(k) ~ kt^(k-1), holding t constant. Using the empirical value on Earth of t=(4.5/5.5)=0.82, we get P(k|t=0.82)/P(k) ~ k*0.82^(k-1).
If we graph this, we get:
which apparently has its maximum at 5. That is, whatever the expected value for the number of steps is after considering the time, if we do update on the time, the largest update is in favor of there having been 5 steps. Compared to other plausible numbers for k, the update is weak, though – this partiuclar piece of evidence is a <2x update on there having been 5 steps compared to there having been 2 steps or 10 steps; the relative update for 5 steps is only even ~5x the size of the update for 20 steps.
Considering the general case (where we don’t know t), we can find the maximum of the update by setting the derivative of kt^(k-1) equal to zero. This derivative is (k ln(t) + 1)t^(k-1), and so we need k∗ln(t)=−1, or k=−1/ln(t). If we replace t with x/(x+1), such that x corresponds to the naive number of steps as I was calculating before, then that’s k=−1/ln(x/(x+1)). Here’s what we get if we graph that:
This is almost exactly my original guess (though weirdly, ~all values for k are ~0.5 higher than the corresponding values of x).
I feel like you are barking up the wrong tree. You are modelling the expected time needed for a sequence of steps, given the number of hard steps in the sequence, and given that all steps will be done before time T. I agree with you that those results are surprising, especially the fact that the expected time required for hard steps is almost independent from how hard those steps are, but I don’t think this is the kind of question that comes to mind on this topic. The questions would be closer to
How many hard steps has life on earth already achieved, and how hard were they?
How many hard steps remain in front of us, and how hard are they?
How long will it take us to achieve them / how likely is it that we will achieve them?
I may have misunderstood your point, if so, feel free to correct me.
Having a model for the dynamics at play is valuable for making progress on further questions. For instance, knowing that the expected hard-step time is ~identical to the expected remaining time gives us reason to expect that the number of hard steps passed on Earth already is perhaps ~4.5 (given that the remaining time in Earth’s habitability window appears to be ~1 billion years). Admittedly, this is a weak update, and there are caveats here, but it’s not nothing.
Additionally, the fact that the expected time for hard steps is ~independent of the difficulty of those steps tells us that, among other things, the fact that abiogenesis was early in Earth’s history (perhaps ~400 MY after Earth formed) is not good evidence that abiogenesis is easy, as this is arguably around what we’d expect from the hard-step model (astrobiologists have previously argued that early abiogenesis on Earth is strong evidence for life being easy and thus common in the universe, and that if it was hard/rare that we’d expect abiogenesis to have occurred around halfway through Earth’s lifetime).
Of course, this model doesn’t help solve the latter two questions you raised, as it doesn’t touch on future steps. A separate line of research that made progress on those questions (or a different angle of attack on the questions that this model can address) would also be a valuable contribution to the field.
Sorry for this late response
I actually disagree with this statement. Assuming from now on that T=1, so that we can normalize everything, your post shows that, if there are k hard steps, given that they are all achieved, then the expected time required to achieve all of them is t=kk+1. Thus, you have built an estimator of t, given k: ^t(k)=kk+1.
Now you want to solve the reverse problem: there is k∗ hard steps, and you want to estimate this quantity. We have one piece of information, which is t. Therefore, the problem is, given t, to build an estimator ^k(t). This is not the same problem. You propose to use ^k(t)=t1−t. The intuition, I assume, is that this is the inverse function of the previous estimator.
The first thing we could expect from this estimator is to have the correct expected value, i.e. E[^k(t)]=k∗. Let’s check that.
The density of t is ft(s)=k∗sk∗−1 (quick sanity check here : the expected value of t is indeed k∗k∗+1). From this we can derive the expected value E[^k(t)]=∫10s1−sk∗sk∗−1ds. And we conclude that E[^k(t)]=∞
How did this happen? Well, it happened because it is likely for t to be really close to 1, which makes ^k(t) explode.
Ok, so the expected value doesn’t match anything. But maybe k∗ is the most likely result, which would already be great. Let’s check that too.
The density of ^k(s) is f^k^k(s)=1|^k′(s)|ft(s). Since ^k(s)=s1−s, we have ^k′(s)=1(1−s)2, therefore f^k^k(s)=(1−s)2k∗sk∗−1. We can derive this to get the argmax :¯s=k∗−1k∗+1, and therefore ¯k=¯s1−¯s=k∗−12. Surprisingly enough, the most likely result is not k∗
Just to be clear about what this means, if there are infinitely many planets in the universe on which live sentient species as advanced as us, and on each of them a smart individual is using your estimator ^k(t)=t1−t to guess how many hard step they already went through, then the average of these estimates is infinite, and the most common result is k∗−12.
Fixing the estimator
An easy thing is to find the maximum likelihood estimator. Just use ^k(t)=1+t1−t and, if you check the math again, you will see that the most likely result is ^k(t)=k∗. Matching the expected value is a bit more difficult, because as soon as your estimator looks like h(t)1−t, the expected value will be infinite.
For us, t=4.55.5, therefore the maximum likelihood estimator gives us ^k(t)=1+t1−t=10. Therefore, 10 hard steps seems a more reasonable estimate than 4.5
So the estimate for the number of hard steps doesn’t make sense in the absence of some prior. Starting with a prior distribution for the likelihood of the number of hard steps, and applying bayes rule based on the time passed and remaining, we will update towards more mass on k = t/(T–t) (basically, we go from P( t | k) to P( k | t)).
By “gives us reason to expect” I didn’t mean “this will be the expected value”, but instead “we should update in this direction”.
Ok let’s do this. Since k is an integer, I guess our prior should be a sequence pk. We already know P(t|k)=ktk−1. We can derive from this P(t)=∑kP(t|k)pk, and finally P(k|t)=P(t|k)pkP(t). In our case, t=4.55.5
I guess the most natural prior would be the uniform prior: we fix an integer N, and set pk=1N for k∈[1;N]. From this we can derive the posterior distribution. This is a bit tedious to do by hand, but easy to code. From the posterior distribution we can for example extract the expected value of k : E[k|t]=∑kkP(k|t). I computed it for N∈[1;100] and voilà!
Obviously E[k|t] is strictly increasing. It also converges toward 10. Actually, for almost all values of N, the expected value is very close to 10. To give a criterion, for N=15, the expected value is already above 7.25, which implies that it is closer to 10 than to 4.5.
We can use different types of prior. I also tried pk=e−kN (with a normalization constant), which is basically a smoother version of the previous one. Instead of stating with certainty “the number of hard steps is at most N”, it’s more “the number of hard steps is typically N, but any huge number is possible”. This gives basically the same result, except is separates from 4.5 even faster, as soon as N=13.
My point is not to say that the number of hard steps is 10 in our case. Obviously I cannot know that. Whatever prior we may choose, we will end up with a distribution of probability, not a nice clear answer. My point is that if, for the sake of simplicity, we choose to only remember one number / to only share one number, it should probably not be 4.5 (or k=tT−t), but instead 10 (or k=t+TT−t). I bet that, if you actually have a prior, and actually make the bayesian update, you’ll find the same result.
So again, I wasn’t referring to the expected value of the number of steps, but instead how we should update after learning about the time – that is, I wasn’t talking about E[k|t], but instead P(k|t)/P(k) for various k.
Let’s dig into this. From Bayes, we have: P(k|t)/P(k)=P(t|k)/P(t). As you say, P(t|k) ~ kt^(k-1). We have the pesky P(t) term, but we can note that for any value of t, this will yield a constant, so we can discard it and recognize that now we don’t get a value for the update, but instead just a relative value (we can’t say how large the update is at any individual k, but we can compare the updates for different k). We are now left with P(k|t)/P(k) ~ kt^(k-1), holding t constant. Using the empirical value on Earth of t=(4.5/5.5)=0.82, we get P(k|t=0.82)/P(k) ~ k*0.82^(k-1).
If we graph this, we get:
which apparently has its maximum at 5. That is, whatever the expected value for the number of steps is after considering the time, if we do update on the time, the largest update is in favor of there having been 5 steps. Compared to other plausible numbers for k, the update is weak, though – this partiuclar piece of evidence is a <2x update on there having been 5 steps compared to there having been 2 steps or 10 steps; the relative update for 5 steps is only even ~5x the size of the update for 20 steps.
Considering the general case (where we don’t know t), we can find the maximum of the update by setting the derivative of kt^(k-1) equal to zero. This derivative is (k ln(t) + 1)t^(k-1), and so we need k∗ln(t)=−1, or k=−1/ln(t). If we replace t with x/(x+1), such that x corresponds to the naive number of steps as I was calculating before, then that’s k=−1/ln(x/(x+1)). Here’s what we get if we graph that:
This is almost exactly my original guess (though weirdly, ~all values for k are ~0.5 higher than the corresponding values of x).
I agree with those computations/results. Thank you