You’re assuming not only that the prior is bimodal, but that it’s “strongly” bimodal, whatever that means.
Stronger than that, even. I’m saying that my distribution over rate-of-problem-solving has a delta spike at zero, mixed with some other distribution at nonzero rates.
Which is indeed how realistic priors should usually look! If a flip a coin 50 times and it comes up heads all 50 times, then I think it’s much more likely that this coin simply has heads on both sides (or some other reason to come up basically-always-heads) than that it has a 1⁄100 or smaller (but importantly nonzero) chance of coming up heads. The prior which corresponds to that kind of reasoning is a delta spike on 0% heads, a delta spike on 0% tails, and then some weight on a continuous distribution between those two.
which I think is quite reasonable, but it doesn’t get you to “the probability is roughly constant as T varies”, because you’re only controlling the tail near zero and not near infinity. If you control both tails then you’re back to where we started, and the difference between a delta spike and a smoothed out version of the delta isn’t that important in this context.
Let Teq be the first time at which P(major alignment failure|takeoff duration=T) is within ϵ of p. As long as ϵ is small, the probability will be roughly constant with time after Teq. Thus, the probability is roughly constant as T varies, once we get past some initial period.
(Side note: in order for this to be interesting, we want ϵ small relative to p.)
For instance, we might expect that approximately-anyone who’s going to notice a particular problem at all will notice it in the first week, so Teq is on the order of a week, and the probability of noticing a problem is approximately constant with respect to time for times much longer than a week.
I agree with that, but I don’t see where the justification for Teq≈1week comes from. You can’t get there just from “there are problems that won’t be noticed at any relevant timescale”, and I think the only argument you’ve given so far for why the “intermediate time scales” should be sparsely populated by problems is your first model, which I didn’t find persuasive for the reasons I gave.
Stronger than that, even. I’m saying that my distribution over rate-of-problem-solving has a delta spike at zero, mixed with some other distribution at nonzero rates.
Which is indeed how realistic priors should usually look! If a flip a coin 50 times and it comes up heads all 50 times, then I think it’s much more likely that this coin simply has heads on both sides (or some other reason to come up basically-always-heads) than that it has a 1⁄100 or smaller (but importantly nonzero) chance of coming up heads. The prior which corresponds to that kind of reasoning is a delta spike on 0% heads, a delta spike on 0% tails, and then some weight on a continuous distribution between those two.
Right, but then it seems like you get back to what I said in my original comment: this gets you to
limT→∞P(major alignment failure|takeoff duration=T)=p≫0
which I think is quite reasonable, but it doesn’t get you to “the probability is roughly constant as T varies”, because you’re only controlling the tail near zero and not near infinity. If you control both tails then you’re back to where we started, and the difference between a delta spike and a smoothed out version of the delta isn’t that important in this context.
Let Teq be the first time at which P(major alignment failure|takeoff duration=T) is within ϵ of p. As long as ϵ is small, the probability will be roughly constant with time after Teq. Thus, the probability is roughly constant as T varies, once we get past some initial period.
(Side note: in order for this to be interesting, we want ϵ small relative to p.)
For instance, we might expect that approximately-anyone who’s going to notice a particular problem at all will notice it in the first week, so Teq is on the order of a week, and the probability of noticing a problem is approximately constant with respect to time for times much longer than a week.
I agree with that, but I don’t see where the justification for Teq≈1week comes from. You can’t get there just from “there are problems that won’t be noticed at any relevant timescale”, and I think the only argument you’ve given so far for why the “intermediate time scales” should be sparsely populated by problems is your first model, which I didn’t find persuasive for the reasons I gave.