Well, the problem is that you have uncertainty over the probability of the code crashing with or without fixing the particular bug you’re looking for. What you need, in order to apply Bayes, is:
A prior model ‘pr(f)’ of the frequency of the code crashing with the bug present. Which is a distribution over crash frequencies. 1/(f(1-f)) is an ignorance prior that might do the job.
I’d assume the probability of the code crashing with the bug fixed is 0, even if that’s not strictly true (since there could be other bugs).
A prior likelihood ‘b’ of the bug being on that line of code. This could depend (for instance) on how you identified that particular line to comment out in the first place.
I’ll call your evidence from running the program E1 (2 crashes out of 7) and E2 (10 runs no crashes)
P(bug on that line| E1, E2) = P(E1, E2 | bug on line) P(bug on that line) / (P(E1, E2)
Given a frequency of crashing ‘f’, P(E2 | f, bug elsewhere) = (1-f)^10
P(E1|f) = f^2 * (1-f)^5
So, then you need to integrate over all possible values of ‘f’:
P(E1) = Integral over [0,1] of: P(E1|f)pr(f)df
P(E2 | bug elsewhere) = Integral over [0,1] of: P(E2 | f, bug elsewhere)pr(f)df
That’s everything you need, the rest is just picking those priors and integrating back up the line. Of course the results are only as good as the priors. A much easier solution is:
“The chance of a crash appears to be about 2⁄7. The chance of getting 10 non-crashes is (5/7)^10 ~= 3.5% ”
Note that this is not the same as the above, it’s an approximation, but it’s probably going to be just as good as doing it the hard way.
Incidentally, you need to be aware that, particularly with intermittant bugs, just because commenting out a line stops the crash (even when you’re 100% sure of the correlation) that doesn’t mean the line itself is the problem. Bugs can be absolutely pathological. For example, if the problem is, say, freeing the same memory twice, then any line that calls a lot of memory allocations will increase the frequency of crashes even if the problem is actually earlier in the code. Also, if the bug is overrunning the end of an array, taking out any line can have a chaotic effect on the optimiser, moving the relative locations of things in memory around and causing the bug to disappear without fixing it (only for it to reappear later). On a simpler level, taking a line out might change the execution path avoiding the bug without fixing it. There seems to be no end to ways in which impossible seeming things can happen in computer code.
This is very true. I simplified the information a bit because I was posting about the math as a matter of intellectual curiosity, not to get help debugging. I have a model of what was causing the crash that I find reasonably convincing, which I outlined in my response to jimrandomh, below. So, while it’s a real-world problem, for purposes of math we can assume that the effect of commenting out the line is an indication of a point bug, as it were. I found the Bayes confusing anyway, so there’s no need to complexify further. :)
Thanks for the math answer; I need to think about it carefully to absorb it fully, but I thought I’d respond to the programming answer first.
Well, the problem is that you have uncertainty over the probability of the code crashing with or without fixing the particular bug you’re looking for. What you need, in order to apply Bayes, is:
A prior model ‘pr(f)’ of the frequency of the code crashing with the bug present. Which is a distribution over crash frequencies. 1/(f(1-f)) is an ignorance prior that might do the job.
I’d assume the probability of the code crashing with the bug fixed is 0, even if that’s not strictly true (since there could be other bugs).
A prior likelihood ‘b’ of the bug being on that line of code. This could depend (for instance) on how you identified that particular line to comment out in the first place. I’ll call your evidence from running the program E1 (2 crashes out of 7) and E2 (10 runs no crashes)
P(bug on that line| E1, E2) = P(E1, E2 | bug on line) P(bug on that line) / (P(E1, E2)
P(bug on that line) = b
P(E1, E2) = P(E1, E2 | bug on line) + P(E1, E2 | bug elsewhere)
P(E1, E2 | bug on line) = P(E1)P(E2 | bug on line) (since the 2 crashes out of 7 are independent of the bug location)
P(E1, E2 | bug elsewhere) = P(E1)P(E2 | bug elsewhere) (as above)
P(E2 | bug on line) = 1
Given a frequency of crashing ‘f’, P(E2 | f, bug elsewhere) = (1-f)^10
P(E1|f) = f^2 * (1-f)^5
So, then you need to integrate over all possible values of ‘f’: P(E1) = Integral over [0,1] of: P(E1|f)pr(f)df
P(E2 | bug elsewhere) = Integral over [0,1] of: P(E2 | f, bug elsewhere)pr(f)df
That’s everything you need, the rest is just picking those priors and integrating back up the line. Of course the results are only as good as the priors. A much easier solution is: “The chance of a crash appears to be about 2⁄7. The chance of getting 10 non-crashes is (5/7)^10 ~= 3.5% ” Note that this is not the same as the above, it’s an approximation, but it’s probably going to be just as good as doing it the hard way.
Incidentally, you need to be aware that, particularly with intermittant bugs, just because commenting out a line stops the crash (even when you’re 100% sure of the correlation) that doesn’t mean the line itself is the problem. Bugs can be absolutely pathological. For example, if the problem is, say, freeing the same memory twice, then any line that calls a lot of memory allocations will increase the frequency of crashes even if the problem is actually earlier in the code. Also, if the bug is overrunning the end of an array, taking out any line can have a chaotic effect on the optimiser, moving the relative locations of things in memory around and causing the bug to disappear without fixing it (only for it to reappear later). On a simpler level, taking a line out might change the execution path avoiding the bug without fixing it. There seems to be no end to ways in which impossible seeming things can happen in computer code.
This is very true. I simplified the information a bit because I was posting about the math as a matter of intellectual curiosity, not to get help debugging. I have a model of what was causing the crash that I find reasonably convincing, which I outlined in my response to jimrandomh, below. So, while it’s a real-world problem, for purposes of math we can assume that the effect of commenting out the line is an indication of a point bug, as it were. I found the Bayes confusing anyway, so there’s no need to complexify further. :)
Thanks for the math answer; I need to think about it carefully to absorb it fully, but I thought I’d respond to the programming answer first.