To be clear, the papers would almost certainly have gone through anyway, the helpful thing was being very comfortable with Bayes rule and immediately noticing, for example, that conditioning on an event with probability 1-o(1) doesn’t influence anything by very much.
Another trick I derived from this comfort is to almost never actually condition on small-probability events. Instead, the better thing to do is to modify the random variables you care about to fail catastrophically in the small probability scenario.
For example, in graph theory I might care about controlling a random variable X which is the number of times a certain substructure appears in the random graph G(n,p), but to do so I need to condition away some tail event E like the appearance of a vertex of extremely high degree. Instead of working with conditional probability for the rest of the argument (which might go on to condition away 3 or 4 other tail events), the nicer thing to do is to modify X into a variable X’ which is defined to be 0 when E occurs, and reason about X’ instead. This is better for multiple reasons; the most important one being that the edge appearances in G(n,p) are no longer independent when you condition on E complement.
I think mostly what I got out of the Sequences was removing an air of mystery around Bayes rule. Here by mystery I mean “System 1 mystery,” i.e. that before I read the Sequences, to figure out a conditional probability I would have to sit down and carefully multiply and divide. This post also helped.
To be clear, the papers would almost certainly have gone through anyway, the helpful thing was being very comfortable with Bayes rule and immediately noticing, for example, that conditioning on an event with probability 1-o(1) doesn’t influence anything by very much.
Another trick I derived from this comfort is to almost never actually condition on small-probability events. Instead, the better thing to do is to modify the random variables you care about to fail catastrophically in the small probability scenario.
For example, in graph theory I might care about controlling a random variable X which is the number of times a certain substructure appears in the random graph G(n,p), but to do so I need to condition away some tail event E like the appearance of a vertex of extremely high degree. Instead of working with conditional probability for the rest of the argument (which might go on to condition away 3 or 4 other tail events), the nicer thing to do is to modify X into a variable X’ which is defined to be 0 when E occurs, and reason about X’ instead. This is better for multiple reasons; the most important one being that the edge appearances in G(n,p) are no longer independent when you condition on E complement.
I think mostly what I got out of the Sequences was removing an air of mystery around Bayes rule. Here by mystery I mean “System 1 mystery,” i.e. that before I read the Sequences, to figure out a conditional probability I would have to sit down and carefully multiply and divide. This post also helped.