Thanks John, that’s just the kind of thing I was hoping for.
The point about behaviour for the very worst components not following the same distribution as the population which I’m studying is a very good one.
I should have given a little more background to the task.
Firstly, the task is to compare a new component with an old one. Data is available for the old one and data is being collected for the new one. The new one has much higher average life but also much higher variance between samples which led to the question as to which will have the worst one-in-a-million performance.
Secondly, the number of data points is very small. Time cost per data point is ~1 week. I’ll be lucky if I have 10 data points in total. Essentially, I’m tasked to try to approve the component with the fewest possible data points.
(This means that so far I don’t have a favoured model (although Weibull is lower than the other 2) and I also don’t have a firm impression of μ and σ.)
I guess all I can achieve with limited data is to say that for components which fit the typical failure mechanism the worst one-in-a-million is better for the new component than the old. However, we don’t have any information on Black Swan events caused by a different failure mechanism for either component and I definitely need to make this clear in any presentation I make on the subject. With such a lack of data points I don’t think I can use rare event modelling to fill in the gaps.
I will have a look at the ideas about:
1. Taking multiple random samples of a million to get a better estimate of CI. I’m pretty sure this is comfortably within my limited programming skills.
(My limited programming skills are the main reason that my method is a bit long winded – doing it step by step was my way of trying to stop myself making a mistake which I didn’t notice. I did make a mistake with my application of the scale parameter for Birnbaum-Saunders at one point which this method helped me spot.)
2. Looking at how adding extra data points affects my result. I think I can use this to compare value of information with cost of information to get a good impression of when we have enough.
Having ~ten data points makes this way more interesting. That’s exactly the kind of problem that I specialize in.
For the log-normal distribution, it should be possible to do the integral for P(data|model) explicitly. The integral is tractable for the normal distribution—it comes out proportional to a power of the sample variance—so just log-transform the data and use that. If you write down the integral for normal-distributed data explicitly and plug it into wolframalpha or something, it should be able to handle it. That would circumvent needing to sample μ and σ.
I don’t know if there’s a corresponding closed form for Birnbaum-Saunders; I had never even heard of it before this. The problem is still sufficiently low-dimensional that it would definitely be tractable computationally, but it would probably be a fair bit of work to code.
Birnbaum-Saunders is an interesting one. For the purposes of fatigue analysis, the assumptions which bring about the three models are:
Weibull—numerous failure modes (of similar failure speed) racing to see which causes the component to fail first
Log-normal—Damage per cycle is proportional to current damage
Birnbaum-Saunders—Damage per cycle is normally distributed and independent of previous damage
My engineering gut says that this component is probably somewhere between Log-normal and Birnbaum-Saunders (I think proportionality will decay as damage increases) which is maybe why I don’t have a clear winner yet.
***
I think I understand now where my original reasoning was incorrect when I was calculating the expected worst in a million. I was just calculating worst in a million for each model and taking a weighted average of the answers. This meant that bad values from the outlier potential pdfs were massively suppressed.
I’ve done some sampling of the worst in a million by repeatedly creating 2 random numbers from 0 to 1. I use the first to select a μ,σ combination based on the posterior for each pair. I use the second random number as a p-value. I then icdf those values to get an x.
Is this an ok sampling method? I’m not sure if I’m missing something or should be using MCMC. I definitely need to read up on this stuff!
The worst in a million is currently dominated by those occasions where the probability distribution is an outlier. In those cases the p-value doesn’t need to be particularly extreme to achieve low x.
I think my initial estimates were based either mainly on uncertainty in p or mainly on uncertainty in μ,σ. The sampling method allows me to account for uncertainty in both which definitely makes more sense. The model seems to react sensibly when I add potential new data so I think I can assess much better now how many data points I require.
That sampling method sounds like it should work, assuming it’s all implemented correctly (not sure what method you’re using to sample from the posterior distribution of μ, σ).
Worst case in a million being dominated by parameter uncertainty definitely makes sense, given the small sample size and the rate at which those distributions fall off.
For μ,σ I effectively created a quasi-cumulative distribution with the parameter pairs as the x-axis.
μ1,σ1. μ2,σ1. μ3,σ1 … μ1,σ2. μ2,σ2. μ3,σ2 … μn,σm
The random number defines the relevant point on the y-axis. From there I get the corresponding μ,σ pair from the x-axis.
If this method works I’ll probably have to code the whole thing instead of using a spreadsheet as I don’t have nearly enough μ,σ values to get a good answer currently.
Thanks John, that’s just the kind of thing I was hoping for.
The point about behaviour for the very worst components not following the same distribution as the population which I’m studying is a very good one.
I should have given a little more background to the task.
Firstly, the task is to compare a new component with an old one. Data is available for the old one and data is being collected for the new one. The new one has much higher average life but also much higher variance between samples which led to the question as to which will have the worst one-in-a-million performance.
Secondly, the number of data points is very small. Time cost per data point is ~1 week. I’ll be lucky if I have 10 data points in total. Essentially, I’m tasked to try to approve the component with the fewest possible data points.
(This means that so far I don’t have a favoured model (although Weibull is lower than the other 2) and I also don’t have a firm impression of μ and σ.)
I guess all I can achieve with limited data is to say that for components which fit the typical failure mechanism the worst one-in-a-million is better for the new component than the old. However, we don’t have any information on Black Swan events caused by a different failure mechanism for either component and I definitely need to make this clear in any presentation I make on the subject. With such a lack of data points I don’t think I can use rare event modelling to fill in the gaps.
I will have a look at the ideas about:
1. Taking multiple random samples of a million to get a better estimate of CI. I’m pretty sure this is comfortably within my limited programming skills.
(My limited programming skills are the main reason that my method is a bit long winded – doing it step by step was my way of trying to stop myself making a mistake which I didn’t notice. I did make a mistake with my application of the scale parameter for Birnbaum-Saunders at one point which this method helped me spot.)
2. Looking at how adding extra data points affects my result. I think I can use this to compare value of information with cost of information to get a good impression of when we have enough.
Having ~ten data points makes this way more interesting. That’s exactly the kind of problem that I specialize in.
For the log-normal distribution, it should be possible to do the integral for P(data|model) explicitly. The integral is tractable for the normal distribution—it comes out proportional to a power of the sample variance—so just log-transform the data and use that. If you write down the integral for normal-distributed data explicitly and plug it into wolframalpha or something, it should be able to handle it. That would circumvent needing to sample μ and σ.
I don’t know if there’s a corresponding closed form for Birnbaum-Saunders; I had never even heard of it before this. The problem is still sufficiently low-dimensional that it would definitely be tractable computationally, but it would probably be a fair bit of work to code.
(I thoroughly enjoy/enjoyed this exchange and think I learned some things from it, and wanted to thank you for having this out in the open)
Birnbaum-Saunders is an interesting one. For the purposes of fatigue analysis, the assumptions which bring about the three models are:
Weibull—numerous failure modes (of similar failure speed) racing to see which causes the component to fail first
Log-normal—Damage per cycle is proportional to current damage
Birnbaum-Saunders—Damage per cycle is normally distributed and independent of previous damage
My engineering gut says that this component is probably somewhere between Log-normal and Birnbaum-Saunders (I think proportionality will decay as damage increases) which is maybe why I don’t have a clear winner yet.
***
I think I understand now where my original reasoning was incorrect when I was calculating the expected worst in a million. I was just calculating worst in a million for each model and taking a weighted average of the answers. This meant that bad values from the outlier potential pdfs were massively suppressed.
I’ve done some sampling of the worst in a million by repeatedly creating 2 random numbers from 0 to 1. I use the first to select a μ,σ combination based on the posterior for each pair. I use the second random number as a p-value. I then icdf those values to get an x.
Is this an ok sampling method? I’m not sure if I’m missing something or should be using MCMC. I definitely need to read up on this stuff!
The worst in a million is currently dominated by those occasions where the probability distribution is an outlier. In those cases the p-value doesn’t need to be particularly extreme to achieve low x.
I think my initial estimates were based either mainly on uncertainty in p or mainly on uncertainty in μ,σ. The sampling method allows me to account for uncertainty in both which definitely makes more sense. The model seems to react sensibly when I add potential new data so I think I can assess much better now how many data points I require.
That sampling method sounds like it should work, assuming it’s all implemented correctly (not sure what method you’re using to sample from the posterior distribution of μ, σ).
Worst case in a million being dominated by parameter uncertainty definitely makes sense, given the small sample size and the rate at which those distributions fall off.
For μ,σ I effectively created a quasi-cumulative distribution with the parameter pairs as the x-axis.
μ1,σ1. μ2,σ1. μ3,σ1 … μ1,σ2. μ2,σ2. μ3,σ2 … μn,σm
The random number defines the relevant point on the y-axis. From there I get the corresponding μ,σ pair from the x-axis.
If this method works I’ll probably have to code the whole thing instead of using a spreadsheet as I don’t have nearly enough μ,σ values to get a good answer currently.
At what point is the data used?
I use it to determine the relative probabilities of each μ,σ pair which in turn create the pseudo cdf.
Ok, that sounds right.