So the reason we simulate things is, basically, to tell us things about the detector, for example its efficiency. If you observe 10 events of type X after 100k collisions, and you want to know the actual rate, you have to know your reconstruction efficiency with respect to that kind of event—if it’s fifty percent (and that would be high in many cases) then you actually had 20 physical events (plus or minus 6, obviously) and that’s the number you use in calculating whatever parameter you’re trying to measure. So you write Monte Carlo simulations, saying “Ok, the D* goes to D0 and pi+ with 67.4% probability, then the D0 goes to Kspipi with 5% probability and such-and-such an angular distribution, then the Ks goes to pions pretty exclusively with this lifetime, then the pions are long-lived enough that they hit the detector, and it has such-and-such a response in this area.” In effect we don’t really deal with quantum mechanics at all, we don’t do anything with the collapse. (Talking here about experiments—there are theorists who do, for example, grid calculations of strong-force interactions and try to predict the value of the proton mass from first principles.) Quantum mechanics only comes in to inform our choice of angular distributions. (Edit: Let me rephrase that. We don’t really simulate the collapse; we say instead, “Ok, there’s an X% chance of this, so roll a pseudorandom number between zero and one; if less than X, that’s the outcome we’re going with. We don’t deal with the transition, as it were, from wave functions to particles.) The actual work is in ‘swimming’ the long-lived decay products through our simulation of the detector. The idea is to produce information in the same format as your real data, for example “voltage spike in channel 627 at timestamp 18”, and then run the same reconstruction software on it as on real data. The difference is that you know exactly what was produced, so you can go back and look at the generated distributions and see if, for example, your efficiency drops in particular regions of phase space. Usually it does, for example if one particle is slow, or especially of course if it flies down the beampipe and doesn’t hit the active parts of the detector.
Calibrating these simulations is a fairly major task that consumes a lot of physicist time and attention. We look at known events; at BaBar, for example, we would occasionally shut off the accelerator and let the detector run, and use the resulting cosmic-ray data for calibration. It helps that there are really only five particles that are long-lived enough to reach the detector, namely pion, kaon, neutron, electron, and proton; so we can study how these particles interact with matter and use that information in the simulations.
Another reason for simulating is to do blind studies. For example, suppose you want to measure the rate at which particle X decays to A+B+C. You need some selection criteria to throw away the background. The hihger your signal-to-noise ratio, the more accurately you can measure the rate, within some limits—there’s a tradeoff in that the more events you have, the better the measurement. So you want to find the sweet spot between 0 data of 100% purity and 100% of the data at 2% purity. (Purity, incidentally, is usually defined as signal/(signal+background).) But you usually don’t want to study the effects of your selections directly on data, because there’s a risk of biasing yourself—for example, in the direction of agreement with a previous measurement of the same quantity. (Millikan’s oil drops are the classic example, although simulations weren’t involved.) So you tune your cuts on Monte Carlo events, and then when you’re happy with them you go see if there’s any actual signal in the data. This sort of thing is one reason physicists are reasonably good about publishing negative results, as in “Search for X”; it could be very embarrassing to work three years on a channel and then be unable to publish because there’s no signal in the data. In such a case the conclusion is “If there had been data of such-and-such a level, we would have seen it (with 95% probability); we didn’t; so we conclude that the process, if it occurs, has a rate lower than X”.
So the reason we simulate things is, basically, to tell us things about the detector, for example its efficiency. If you observe 10 events of type X after 100k collisions, and you want to know the actual rate, you have to know your reconstruction efficiency with respect to that kind of event—if it’s fifty percent (and that would be high in many cases) then you actually had 20 physical events (plus or minus 6, obviously) and that’s the number you use in calculating whatever parameter you’re trying to measure. So you write Monte Carlo simulations, saying “Ok, the D* goes to D0 and pi+ with 67.4% probability, then the D0 goes to Kspipi with 5% probability and such-and-such an angular distribution, then the Ks goes to pions pretty exclusively with this lifetime, then the pions are long-lived enough that they hit the detector, and it has such-and-such a response in this area.” In effect we don’t really deal with quantum mechanics at all, we don’t do anything with the collapse. (Talking here about experiments—there are theorists who do, for example, grid calculations of strong-force interactions and try to predict the value of the proton mass from first principles.) Quantum mechanics only comes in to inform our choice of angular distributions. (Edit: Let me rephrase that. We don’t really simulate the collapse; we say instead, “Ok, there’s an X% chance of this, so roll a pseudorandom number between zero and one; if less than X, that’s the outcome we’re going with. We don’t deal with the transition, as it were, from wave functions to particles.) The actual work is in ‘swimming’ the long-lived decay products through our simulation of the detector. The idea is to produce information in the same format as your real data, for example “voltage spike in channel 627 at timestamp 18”, and then run the same reconstruction software on it as on real data. The difference is that you know exactly what was produced, so you can go back and look at the generated distributions and see if, for example, your efficiency drops in particular regions of phase space. Usually it does, for example if one particle is slow, or especially of course if it flies down the beampipe and doesn’t hit the active parts of the detector.
Calibrating these simulations is a fairly major task that consumes a lot of physicist time and attention. We look at known events; at BaBar, for example, we would occasionally shut off the accelerator and let the detector run, and use the resulting cosmic-ray data for calibration. It helps that there are really only five particles that are long-lived enough to reach the detector, namely pion, kaon, neutron, electron, and proton; so we can study how these particles interact with matter and use that information in the simulations.
Another reason for simulating is to do blind studies. For example, suppose you want to measure the rate at which particle X decays to A+B+C. You need some selection criteria to throw away the background. The hihger your signal-to-noise ratio, the more accurately you can measure the rate, within some limits—there’s a tradeoff in that the more events you have, the better the measurement. So you want to find the sweet spot between 0 data of 100% purity and 100% of the data at 2% purity. (Purity, incidentally, is usually defined as signal/(signal+background).) But you usually don’t want to study the effects of your selections directly on data, because there’s a risk of biasing yourself—for example, in the direction of agreement with a previous measurement of the same quantity. (Millikan’s oil drops are the classic example, although simulations weren’t involved.) So you tune your cuts on Monte Carlo events, and then when you’re happy with them you go see if there’s any actual signal in the data. This sort of thing is one reason physicists are reasonably good about publishing negative results, as in “Search for X”; it could be very embarrassing to work three years on a channel and then be unable to publish because there’s no signal in the data. In such a case the conclusion is “If there had been data of such-and-such a level, we would have seen it (with 95% probability); we didn’t; so we conclude that the process, if it occurs, has a rate lower than X”.