As I understand (but correct me if I am wrong), your claim is that we don’t feel surprise when observing what is commonly thought of as a rare event, because we don’t actually observe a rare event, because of one quirk of our human psychology we implicitly use a non-maximal event space.
Yes, this is correct. The general principle is “You can observe only what you are paying attention to”. And human quirk is by default paying attention to many Heads/Tails in a row.
But you now seem to allow for another probability space which, if true, seems to me a somewhat inelegant part of the theory. Do you claim that our subconscious tracks events in multiple ways simultaneously or am I misunderstanding you?
It’s not I who allow stuff. The point is that there is nothing in probability theory that forbids us from doing it. It’s not some new radical idea either. Solomonoff inductor is supposed to track all the models at the same time, for instance.
Another point is that, in fact, our minds (not necessary subconscious) are indeed able to use multiple mathematical models at the same time, as long as the sample spaces are different. This is an empirical claim, which you may check yourself.
The question of elegance is less important to me, it’s a matter of taste, essentially. Personally, I think using two different models for their specific tasks and nothing else is a more elegant design than trying to stick all the required functionality and then some more into one bigger model. Actually I still don’t think that just one model would be enough for what you want it to do, anyway.
minor point: Your F is not the powerset of Ω
Yes, you are completely correct, I forgot to add triplets and a couple of pairs. Anyway, let’s explore this kind of modeling:
Therefore X1 is actually a random variable modeling that the first throw is head.
So suppose you make a series of n coin tosses, your sample space are all possible combinations of Heads and Tails with length n and event space is its powers set. Let’s define event Hi as a set of all possible combinations of Heads and Tails with length n where Heads is in i-th place. You toss a coin the first time and get Heads. Has the event H1 just happened?
No, because H1 is realized only when either of its outcomes is realized and it’s outcomes are series of coin tosses with length n. So you can only say that H1 happened after all the coin tossing is done.
So if you want to update in process, you need a model for the i-th coin toss—which sample space are all possible combinations of Heads and Tails with length i and event space is its powers set. And then with every coin toss this model changes. So in the end you will have n different models.
Also I think you will have to use model for the current coin toss result anyway so that the switch from i to i+1 can properly be implemented. Maybe there is some clever way around this problem. In any case, human minds seem to be working the obvious way: notice that the outcome of the current coin toss is Heads/Tails, add it to the list of all the previous coin tosses with length i and thus be able to say which outcome in i+1th model has been realized.
And, of course, if you want to compare different assumptions about a coin you will have to track even more models in your mind.
(EDIT: Rereading the post, it seems you’ve adressed this part: if I understand correctly, one can influence their event space by way of focusing on specific outcomes?
Yes, your edit is correct. We can change what we are paying attention to and thus observe different events, which mathematically can be described as having different event spaces. There are some potential issues here, like whether your really made yourself pay attention only to the specific combination you’ve selected and thus are not surprised at all by ten Heads in a row, or are you just adding a new combination to the list of specific combinations which includes all Heads and all Tails thus becoming only about 50% less surprised when observing all Heads.
But this doesn’t matter much in the realm of decision making. If you want to do some action with only 1/2^n probability you can commit to a specific outcome with length n, toss a coin n times and do the action only if this particular outcome is realized.
Wouldn’t it be much simpler to say that in 1, your previous assumption that the coinflips are independent from what you write on a paper became too low probability after observing the coinflips and that caused the feeling of surprise?
Strictly speaking, no, because now you have to add the whole new level of multiple alternative hypothesis with their own probability spaces you are also tracking in your mind and prioritizing between them.
I have a simple rule: “surprise is proportional to the improbability of the event observed” and then use already existent difference between events and outcomes, to explain why observing every outcome of a random number generator is not surprising.
You add an extra distinction between “observed events” and “assumption invalidating observed events”. And I don’t see what it bring to the table. Seems to be a clear case of an extra entity. You can just reduce three entities model (assumption invalidating events, events, outcomes) to two entities (events, outcomes) model, without loosing anything.
It’s not true that any time I observe a low-probability event, one of my assumptions gets low-prob. For example, if I observe HHTHTTHHTTHT, no assumption of mine does, because I didn’t have a previous assumption that I will get coinflips different from HHTHTTHHTTHT.
If you didn’t have an assumption that observing HHTHTTHHTTHT is improbable then in what sense did you observe an improbable event when you saw the outcome HHTHTTHHTTHT?
Your assumptions can be described as a probability space with less rich sigma-algebra in which outcome HHTHTTHHTTHT isn’t an event in itself. Let’s call it model A. Observing an improbable event in model A equals your assumption becoming improbable and vice versa.
On the other hand, you are also trying to keep a probability space with a power set in your mind as well. And there {HHTHTTHHTTHT} is an event with low probability. This is model B.
What you are saying is that if you observed an outcome that corresponds to a low probable event in model B, it doesn’t mean that you’ve observed a low probable event in model A. And I completely agree. What I’m saying, is that you do not need to talk about model B in the first place, as it doesn’t actually correspond to to what you are able to observe and just adds extra confusion.
To me your explanation leaves some things unexplained: for example: In what situation will our human psychology use which non-maximal event spaces? What is the evolutionary reason for this quirk? Isn’t being surprised in the all heads case rational in an objective sense? Should we expect an alien species to be or not be surprised?
Naturally, it depends on our assumptions, what we are paying attention to. A person who is tracking a specific outcome and sees it being realized observes a much less probable event than a person who is tracking a dozen different outcomes, this one included.
There are some built in intuitions about what feels more or less random and its possible to speculate about their evolutionary reasons for them and for our ability to modify what we are paying attention to. There are, indeed, more things to be said on these topics. But they are besides the point of what I wanted to communicate in this post—probability theory and one of its apparent paradoxes which is quite relevant to anthropic reasoning which I’m trying to solve. The idea that our brain is a pattern seeking machine is already quite popular and I doubt that I have much new to add here.
Yes, this is correct. The general principle is “You can observe only what you are paying attention to”. And human quirk is by default paying attention to many Heads/Tails in a row.
It’s not I who allow stuff. The point is that there is nothing in probability theory that forbids us from doing it. It’s not some new radical idea either. Solomonoff inductor is supposed to track all the models at the same time, for instance.
Another point is that, in fact, our minds (not necessary subconscious) are indeed able to use multiple mathematical models at the same time, as long as the sample spaces are different. This is an empirical claim, which you may check yourself.
The question of elegance is less important to me, it’s a matter of taste, essentially. Personally, I think using two different models for their specific tasks and nothing else is a more elegant design than trying to stick all the required functionality and then some more into one bigger model. Actually I still don’t think that just one model would be enough for what you want it to do, anyway.
Yes, you are completely correct, I forgot to add triplets and a couple of pairs. Anyway, let’s explore this kind of modeling:
So suppose you make a series of n coin tosses, your sample space are all possible combinations of Heads and Tails with length n and event space is its powers set. Let’s define event Hi as a set of all possible combinations of Heads and Tails with length n where Heads is in i-th place. You toss a coin the first time and get Heads. Has the event H1 just happened?
No, because H1 is realized only when either of its outcomes is realized and it’s outcomes are series of coin tosses with length n. So you can only say that H1 happened after all the coin tossing is done.
So if you want to update in process, you need a model for the i-th coin toss—which sample space are all possible combinations of Heads and Tails with length i and event space is its powers set. And then with every coin toss this model changes. So in the end you will have n different models.
Also I think you will have to use model for the current coin toss result anyway so that the switch from i to i+1 can properly be implemented. Maybe there is some clever way around this problem. In any case, human minds seem to be working the obvious way: notice that the outcome of the current coin toss is Heads/Tails, add it to the list of all the previous coin tosses with length i and thus be able to say which outcome in i+1th model has been realized.
And, of course, if you want to compare different assumptions about a coin you will have to track even more models in your mind.
Yes, your edit is correct. We can change what we are paying attention to and thus observe different events, which mathematically can be described as having different event spaces. There are some potential issues here, like whether your really made yourself pay attention only to the specific combination you’ve selected and thus are not surprised at all by ten Heads in a row, or are you just adding a new combination to the list of specific combinations which includes all Heads and all Tails thus becoming only about 50% less surprised when observing all Heads.
But this doesn’t matter much in the realm of decision making. If you want to do some action with only 1/2^n probability you can commit to a specific outcome with length n, toss a coin n times and do the action only if this particular outcome is realized.
Strictly speaking, no, because now you have to add the whole new level of multiple alternative hypothesis with their own probability spaces you are also tracking in your mind and prioritizing between them.
I have a simple rule: “surprise is proportional to the improbability of the event observed” and then use already existent difference between events and outcomes, to explain why observing every outcome of a random number generator is not surprising.
You add an extra distinction between “observed events” and “assumption invalidating observed events”. And I don’t see what it bring to the table. Seems to be a clear case of an extra entity. You can just reduce three entities model (assumption invalidating events, events, outcomes) to two entities (events, outcomes) model, without loosing anything.
If you didn’t have an assumption that observing HHTHTTHHTTHT is improbable then in what sense did you observe an improbable event when you saw the outcome HHTHTTHHTTHT?
Your assumptions can be described as a probability space with less rich sigma-algebra in which outcome HHTHTTHHTTHT isn’t an event in itself. Let’s call it model A. Observing an improbable event in model A equals your assumption becoming improbable and vice versa.
On the other hand, you are also trying to keep a probability space with a power set in your mind as well. And there {HHTHTTHHTTHT} is an event with low probability. This is model B.
What you are saying is that if you observed an outcome that corresponds to a low probable event in model B, it doesn’t mean that you’ve observed a low probable event in model A. And I completely agree. What I’m saying, is that you do not need to talk about model B in the first place, as it doesn’t actually correspond to to what you are able to observe and just adds extra confusion.
Naturally, it depends on our assumptions, what we are paying attention to. A person who is tracking a specific outcome and sees it being realized observes a much less probable event than a person who is tracking a dozen different outcomes, this one included.
There are some built in intuitions about what feels more or less random and its possible to speculate about their evolutionary reasons for them and for our ability to modify what we are paying attention to. There are, indeed, more things to be said on these topics. But they are besides the point of what I wanted to communicate in this post—probability theory and one of its apparent paradoxes which is quite relevant to anthropic reasoning which I’m trying to solve. The idea that our brain is a pattern seeking machine is already quite popular and I doubt that I have much new to add here.