Birnbaum-Saunders is an interesting one. For the purposes of fatigue analysis, the assumptions which bring about the three models are:
Weibull—numerous failure modes (of similar failure speed) racing to see which causes the component to fail first
Log-normal—Damage per cycle is proportional to current damage
Birnbaum-Saunders—Damage per cycle is normally distributed and independent of previous damage
My engineering gut says that this component is probably somewhere between Log-normal and Birnbaum-Saunders (I think proportionality will decay as damage increases) which is maybe why I don’t have a clear winner yet.
***
I think I understand now where my original reasoning was incorrect when I was calculating the expected worst in a million. I was just calculating worst in a million for each model and taking a weighted average of the answers. This meant that bad values from the outlier potential pdfs were massively suppressed.
I’ve done some sampling of the worst in a million by repeatedly creating 2 random numbers from 0 to 1. I use the first to select a μ,σ combination based on the posterior for each pair. I use the second random number as a p-value. I then icdf those values to get an x.
Is this an ok sampling method? I’m not sure if I’m missing something or should be using MCMC. I definitely need to read up on this stuff!
The worst in a million is currently dominated by those occasions where the probability distribution is an outlier. In those cases the p-value doesn’t need to be particularly extreme to achieve low x.
I think my initial estimates were based either mainly on uncertainty in p or mainly on uncertainty in μ,σ. The sampling method allows me to account for uncertainty in both which definitely makes more sense. The model seems to react sensibly when I add potential new data so I think I can assess much better now how many data points I require.
That sampling method sounds like it should work, assuming it’s all implemented correctly (not sure what method you’re using to sample from the posterior distribution of μ, σ).
Worst case in a million being dominated by parameter uncertainty definitely makes sense, given the small sample size and the rate at which those distributions fall off.
For μ,σ I effectively created a quasi-cumulative distribution with the parameter pairs as the x-axis.
μ1,σ1. μ2,σ1. μ3,σ1 … μ1,σ2. μ2,σ2. μ3,σ2 … μn,σm
The random number defines the relevant point on the y-axis. From there I get the corresponding μ,σ pair from the x-axis.
If this method works I’ll probably have to code the whole thing instead of using a spreadsheet as I don’t have nearly enough μ,σ values to get a good answer currently.
Birnbaum-Saunders is an interesting one. For the purposes of fatigue analysis, the assumptions which bring about the three models are:
Weibull—numerous failure modes (of similar failure speed) racing to see which causes the component to fail first
Log-normal—Damage per cycle is proportional to current damage
Birnbaum-Saunders—Damage per cycle is normally distributed and independent of previous damage
My engineering gut says that this component is probably somewhere between Log-normal and Birnbaum-Saunders (I think proportionality will decay as damage increases) which is maybe why I don’t have a clear winner yet.
***
I think I understand now where my original reasoning was incorrect when I was calculating the expected worst in a million. I was just calculating worst in a million for each model and taking a weighted average of the answers. This meant that bad values from the outlier potential pdfs were massively suppressed.
I’ve done some sampling of the worst in a million by repeatedly creating 2 random numbers from 0 to 1. I use the first to select a μ,σ combination based on the posterior for each pair. I use the second random number as a p-value. I then icdf those values to get an x.
Is this an ok sampling method? I’m not sure if I’m missing something or should be using MCMC. I definitely need to read up on this stuff!
The worst in a million is currently dominated by those occasions where the probability distribution is an outlier. In those cases the p-value doesn’t need to be particularly extreme to achieve low x.
I think my initial estimates were based either mainly on uncertainty in p or mainly on uncertainty in μ,σ. The sampling method allows me to account for uncertainty in both which definitely makes more sense. The model seems to react sensibly when I add potential new data so I think I can assess much better now how many data points I require.
That sampling method sounds like it should work, assuming it’s all implemented correctly (not sure what method you’re using to sample from the posterior distribution of μ, σ).
Worst case in a million being dominated by parameter uncertainty definitely makes sense, given the small sample size and the rate at which those distributions fall off.
For μ,σ I effectively created a quasi-cumulative distribution with the parameter pairs as the x-axis.
μ1,σ1. μ2,σ1. μ3,σ1 … μ1,σ2. μ2,σ2. μ3,σ2 … μn,σm
The random number defines the relevant point on the y-axis. From there I get the corresponding μ,σ pair from the x-axis.
If this method works I’ll probably have to code the whole thing instead of using a spreadsheet as I don’t have nearly enough μ,σ values to get a good answer currently.
At what point is the data used?
I use it to determine the relative probabilities of each μ,σ pair which in turn create the pseudo cdf.
Ok, that sounds right.