To simplify, only focus on the case of 0. 1 follows by symmetry. (All values here are non-negative, though with clifford algebras we may extend to negative probabilities too without much difficulty.)
Instead of 0,[2] take an infinitesimal ε∼0. Then we have some quantity that’s strictly bigger than 0.
What events are 0 probability?
In the naive setup, these would normally be 0 probability:
Events that are physically possible, but you have no idea how. A lot of the far future falls into this. Also a lot of things beyond your (current) ken.
Events that would normally be given probability exactly 0, like sampling a given point on a unit interval, uniformly at random.
Things you’re “sure” won’t happen.
In this setup, there’s one answer: events that are actually impossible, absurdities, the empty event. Everything else has at least infinitesimal probability. The dart on the interval would have infinitesimal probability, say ε. Now take the event of hitting the number 2. It’s not even in the sample space. So it’s actually impossible and gets probability 0.
What’s the point of this?
What does it get us over what naive numbers may give?
We get a grading.
Example: flipping a coin.
3 outcomes: heads/tails/it lands on its edge. We assign ε to it landing on the edge, because it’s possible but was out of scope for the usual coin flip experiment. I got this example by axiomatically assuming something is impossible (a 3rd outcome for coin flips), realized that actually something is physically possible but have no idea how (the ε term). Say that I then realize that the wind being a certain way makes landing on the edge less likely, but the wind acting that way is itself even more unlikely, say a probability of ε2. Then I can update my overall “lands on edge” probability to ε−ε2, and the other 2 to 1−(ε−ε2)2∼12.
Notice that the infinitesimals are placeholders, more valuable for their exponent and coefficient than anything else. If we’re “sure” about something, we could give it a value 1−ε. Even better than surety is that someone on 4chan told you, which you could guess a value of 1−ε2. The exact value of ε is less important than the grading it induces, a grading that gets a lot of its value from ε being inaccessible to standard numbers by standard operations. Each level of inaccessibility is like a level of confidence.
To get useful results, we rely on spillover. For a formal explanation, see Robert Goldblatt’s Lectures on The Hyperreals.
Overspill/overflow
Underspill/underflow
Principle of permanence.
The basic idea of these principles is “as above ⇔ so below”. Say some internal function has all infinitesimals in its range. Then it must have non-infinitesimals too, since the set of all infinitesimals is known to be external, and images of internal functions over internal sets are internal. This is an example of overspill. Infinitesimal behavior has spilled over into the appreciable domain. Similar results hold for all levels, big and small.
Intuition: “Big standard” ~ “hyperfinite”.
Reason: if n is standard, so is n+1. So there’s no biggest standard number (you know this already). If H is hyperfinite, then so is H−1. So the right end of the telescope picture, with the dots, identifies n and H.
Similarly for “arbitarily large infinitesimal” ~ “arbitrarily small but standard and nonzero”. See this pic:
By spillover, we can substitute in small BUT STANDARD values wherever a term is supposed to be infinitesimal (or large but standard whenever we see a hyperfinite term) and get a standard answer out. Our whole system of reasoning gracefully degrades into its own finite approximation. Take the coin example. It doesn’t matter that the chance of landing on the edge isn’t infinitesimal but appreciable. If we set ε:=0.001, then the chance of landing on the edge is still ε−ε2=0.001−0.0012=0.000999. The chance of heads/tails is 0.4995005, which makes sense. For much more in this vein, see this paper.
Radically elementary in the sense of the hyperreals. Like a lot of my recent posts, I don’t explain anything about the hyperreals, since it beat being paralyzed topsorting a whole textbook =(.
0 is the only number that’s both standard and infinitesimal, and is the smallest infinitesimal. So 0 is still special and shouldn’t be treated lightly by probability. If anything, infinitesimals reveal just how special 0 is. But we can sidestep the issue with this infinite ladder of levels.
0 and 1 aren’t probabilities
Link post
Eliezer wrote about this awhile ago too.
Now for a “radically elementary” take[1].
To simplify, only focus on the case of 0. 1 follows by symmetry. (All values here are non-negative, though with clifford algebras we may extend to negative probabilities too without much difficulty.)
Instead of 0,[2] take an infinitesimal ε∼0. Then we have some quantity that’s strictly bigger than 0.
What events are 0 probability?
In the naive setup, these would normally be 0 probability:
Events that are physically possible, but you have no idea how. A lot of the far future falls into this. Also a lot of things beyond your (current) ken.
Events that would normally be given probability exactly 0, like sampling a given point on a unit interval, uniformly at random.
Things you’re “sure” won’t happen.
In this setup, there’s one answer: events that are actually impossible, absurdities, the empty event. Everything else has at least infinitesimal probability. The dart on the interval would have infinitesimal probability, say ε. Now take the event of hitting the number 2. It’s not even in the sample space. So it’s actually impossible and gets probability 0.
What’s the point of this?
What does it get us over what naive numbers may give?
We get a grading.
Example: flipping a coin.
3 outcomes: heads/tails/it lands on its edge. We assign ε to it landing on the edge, because it’s possible but was out of scope for the usual coin flip experiment. I got this example by axiomatically assuming something is impossible (a 3rd outcome for coin flips), realized that actually something is physically possible but have no idea how (the ε term). Say that I then realize that the wind being a certain way makes landing on the edge less likely, but the wind acting that way is itself even more unlikely, say a probability of ε2. Then I can update my overall “lands on edge” probability to ε−ε2, and the other 2 to 1−(ε−ε2)2∼12.
Notice that the infinitesimals are placeholders, more valuable for their exponent and coefficient than anything else. If we’re “sure” about something, we could give it a value 1−ε. Even better than surety is that someone on 4chan told you, which you could guess a value of 1−ε2. The exact value of ε is less important than the grading it induces, a grading that gets a lot of its value from ε being inaccessible to standard numbers by standard operations. Each level of inaccessibility is like a level of confidence.
To get useful results, we rely on spillover. For a formal explanation, see Robert Goldblatt’s Lectures on The Hyperreals.
Overspill/overflow
Underspill/underflow
Principle of permanence.
The basic idea of these principles is “as above ⇔ so below”. Say some internal function has all infinitesimals in its range. Then it must have non-infinitesimals too, since the set of all infinitesimals is known to be external, and images of internal functions over internal sets are internal. This is an example of overspill. Infinitesimal behavior has spilled over into the appreciable domain. Similar results hold for all levels, big and small.
Intuition: “Big standard” ~ “hyperfinite”.
Reason: if n is standard, so is n+1. So there’s no biggest standard number (you know this already). If H is hyperfinite, then so is H−1. So the right end of the telescope picture, with the dots, identifies n and H.
Similarly for “arbitarily large infinitesimal” ~ “arbitrarily small but standard and nonzero”. See this pic:
By spillover, we can substitute in small BUT STANDARD values wherever a term is supposed to be infinitesimal (or large but standard whenever we see a hyperfinite term) and get a standard answer out. Our whole system of reasoning gracefully degrades into its own finite approximation. Take the coin example. It doesn’t matter that the chance of landing on the edge isn’t infinitesimal but appreciable. If we set ε:=0.001, then the chance of landing on the edge is still ε−ε2=0.001−0.0012=0.000999. The chance of heads/tails is 0.4995005, which makes sense. For much more in this vein, see this paper.
Radically elementary in the sense of the hyperreals. Like a lot of my recent posts, I don’t explain anything about the hyperreals, since it beat being paralyzed topsorting a whole textbook =(.
0 is the only number that’s both standard and infinitesimal, and is the smallest infinitesimal. So 0 is still special and shouldn’t be treated lightly by probability. If anything, infinitesimals reveal just how special 0 is. But we can sidestep the issue with this infinite ladder of levels.