I don’t know.. not using the whole powerset when Ω is finite kinda rubs me the wrong way. (EDIT: correction: what clashes with my aesthetic sense isn’t that it’s not the whole powerset, rather that I instinctively want to have random variables denoting any coinflip when presented with a list of coinflips yet I can’t have that if the set of events is not the powerset because in that case those wouldn’t be measurable functions. I think the following expands on this same intuition without the measure-theoretic formalism.)
Consider the situation where I’m flipping the coin and I keep getting heads, I imagine I get more and more surprised as I’m flipping.
Consider now that I am at the moment when I’ve already flipped n coins, but before flipping the (n+1)th one. I’m thinking about the next flip: To model the situation in my mind, there clearly should be an event where the (n+1)th coin is heads and another event where the (n+1)th coin is tails. Furthermore, these events should have equal (possibly conditional) probabilities yet I will be much more surprised if I get heads again.
This makes me think that the key isn’t that I didn’t actually observe a low probability event (because in my opinion it does not make sense to model the situation above with a σ-algebra where the (n+1)th coin being tails is grouped with the (n+1)th coin being heads because in that case I wouldn’t be able to calculate separate probabilities for those events) rather the key is that I feel surprise when one of my assumptions about the world has become too improbable compared to an alternative: in this case, the assumption that the coin is unbiased. After observing lots of heads the probability that the coin is biased in favor of heads gets much greater than that of it being unbiased, even if we started out with a high prior that it’s unbiased.
In real life we never know for sure that coin tosses are independent and unbiased. If we flip a coin 50 times and get 50 heads, we are not actually surprised at the level of an event with 1 in 2 to the −50 probability (about 1 in 10 to the −15). We are instead surprised at the level of our subjective probability that the coin is grossly biased (for example, it might have a head on both sides), which is likely much greater than that.
But in any case, it is not rare for rare events to occur, for the simple reason that the total probability of a set of mutually-exclusive rare events need not be low. That is the case with 50 coin tosses that we do assume are unbiased and independent. Any given result is very rare, but of course the total probability for all possible results is one. There’s nothing puzzling about this.
Trying to avoid rare events by choosing a restrictive sigma algebra is not a viable approach. In the sigma algebra for 50 coin tosses, we would surely want to include events for “1st toss is a head”, “2nd toss is a head”, …, “50th toss is a head”, which are all not rare, and are the sort of event one might want to refer to in practice. But sigma algebras are closed under complement and intersection, so if these events are in the sigma algebra, then so are all the events like “1st toss is a head, 2nd toss is a tail, 3rd toss is a head, …, 50th toss is a tail”, which all have probability 1 in 20 to the −50.
I instinctively want to have random variables denoting any coinflip when presented with a list of coinflips yet I can’t have that if the set of events is not the powerset because in that case those wouldn’t be measurable functions.
No problem, you just explicitly use two different mathematical models at the same time, modelling different aspects of your problem. One for the whole series of the coin tosses and the other for the i-th coin toss.
doesn’t allow you to express individual coin tosses anyway—you need a different sample space for it.
Consider the situation where I’m flipping the coin and I keep getting heads, I imagine I get more and more surprised as I’m flipping.
Likewise, consider situations where:
You’ve written a specific non-trivial long combination of Heads and Tails and then, as you flip a coin, this particular combination is being produced. All the same logic for n-th flip. You are not much surprised to see every individual coin toss outcome, but are more and more surprised that they end up into a specific sequence that you’ve written beforehand.
Same as 1. but you’ve written a different combination of Heads and Tails and thus you are neither surprised to see every individual outcome, nor the total result.
Same as 1. but you haven’t written any combination in advance. Once again you are not surprised.
In 1. you’ve observed a rare event and are surprised because of it. In 2. and 3. you didn’t and thus you are not. Even if you’ve observed the same outcome - sequence of Heads and Tails—in all 1. 2. and 3. The events that you’ve observed are quite different. And if you do a simple sanity check, you will notice that, indeed, it’s very easy to replicate situations 2. and 3. but very hard to replicate 1.
Situation 1. is similar to observing many Heads in a row. The difference is that your brain is wired to track for many Heads/many Tails by default, but you being able to track the specific non-trivial combination requires an active precommitment.
rather the key is that I feel surprise when one of my assumptions about the world has become too improbable
This is not an alternative explanation. This is restating the same fact in different terms. If your assumption about the world has become too improbable it means that you’ve accumulated enough evidence against this assumption. Strength of the evidence against an assumption is literally how improbable encountering such event according to this assumption is. It’s not one way or the other. It’s always both.
As I understand (but correct me if I am wrong), your claim is that we don’t feel surprise when observing what is commonly thought of as a rare event, because we don’t actually observe a rare event, because of one quirk of our human psychology we implicitly use a non-maximal event space. But you now seem to allow for another probability space which, if true, seems to me a somewhat inelegant part of the theory. Do you claim that our subconscious tracks events in multiple ways simultaneously or am I misunderstanding you?
Relatedly, the power set does allow me to express individual coin tosses. Let X1 be the following function on Ω:
X1(ω)={1 if ω∈{HH,HT}0 otherwise
In this case X1 is measurable, because X−11[{1}]={HH,HT}∈P(Ω) (minor point: Your F is not the powerset of Ω), same for X−11[{0}]. Therefore X1 is actually a random variable modeling that the first throw is head.
Regarding your examples, I’m not sure I’m understanding you: Is your claim that the eventspace is different in the three cases leading to different probabilities for the events observed? I thought your theory said that our human psychology works with non-maximal eventspaces, but it seems it also works with different event spaces in different situations? (EDIT: Rereading the post, it seems you’ve adressed this part: if I understand correctly, one can influence their event space by way of focusing on specific outcomes?)
Wouldn’t it be much simpler to say that in 1, your previous assumption that the coinflips are independent from what you write on a paper became too low probability after observing the coinflips and that caused the feeling of surprise?
I’m afraid I don’t understand your last paragraph, to me it clearly seems an alternative explanation. Please, elaborate. It’s not true that any time I observe a low-probability event, one of my assumptions gets low-prob. For example, if I observe HHTHTTHHTTHT, no assumption of mine does, because I didn’t have a previous assumption that I will get coinflips different from HHTHTTHHTTHT. An assumption is not just any statement\proposition\event, it’s a belief about the world which is actually assumed beforehand.
To me your explanation leaves some things unexplained: for example: In what situation will our human psychology use which non-maximal event spaces? What is the evolutionary reason for this quirk? Isn’t being surprised in the all heads case rational in an objective sense? Should we expect an alien species to be or not be surprised?
For my proposed explanation these are easy questions to answer: We are not surprised because of the non-maximal event spaces, rather, we are surprised if one of our assumptions loses a lot of probability. The evolutionary reason is that the feeling of surprise caused us to investigate and in cases when one of our assumptions got too improbable, we should actually investigate the alternatives. Yes, being surprised in these cases is objectively rational and we should expect an alien species to do the same on all-heads throw and not do the same on some random string of H/T.
As I understand (but correct me if I am wrong), your claim is that we don’t feel surprise when observing what is commonly thought of as a rare event, because we don’t actually observe a rare event, because of one quirk of our human psychology we implicitly use a non-maximal event space.
Yes, this is correct. The general principle is “You can observe only what you are paying attention to”. And human quirk is by default paying attention to many Heads/Tails in a row.
But you now seem to allow for another probability space which, if true, seems to me a somewhat inelegant part of the theory. Do you claim that our subconscious tracks events in multiple ways simultaneously or am I misunderstanding you?
It’s not I who allow stuff. The point is that there is nothing in probability theory that forbids us from doing it. It’s not some new radical idea either. Solomonoff inductor is supposed to track all the models at the same time, for instance.
Another point is that, in fact, our minds (not necessary subconscious) are indeed able to use multiple mathematical models at the same time, as long as the sample spaces are different. This is an empirical claim, which you may check yourself.
The question of elegance is less important to me, it’s a matter of taste, essentially. Personally, I think using two different models for their specific tasks and nothing else is a more elegant design than trying to stick all the required functionality and then some more into one bigger model. Actually I still don’t think that just one model would be enough for what you want it to do, anyway.
minor point: Your F is not the powerset of Ω
Yes, you are completely correct, I forgot to add triplets and a couple of pairs. Anyway, let’s explore this kind of modeling:
Therefore X1 is actually a random variable modeling that the first throw is head.
So suppose you make a series of n coin tosses, your sample space are all possible combinations of Heads and Tails with length n and event space is its powers set. Let’s define event Hi as a set of all possible combinations of Heads and Tails with length n where Heads is in i-th place. You toss a coin the first time and get Heads. Has the event H1 just happened?
No, because H1 is realized only when either of its outcomes is realized and it’s outcomes are series of coin tosses with length n. So you can only say that H1 happened after all the coin tossing is done.
So if you want to update in process, you need a model for the i-th coin toss—which sample space are all possible combinations of Heads and Tails with length i and event space is its powers set. And then with every coin toss this model changes. So in the end you will have n different models.
Also I think you will have to use model for the current coin toss result anyway so that the switch from i to i+1 can properly be implemented. Maybe there is some clever way around this problem. In any case, human minds seem to be working the obvious way: notice that the outcome of the current coin toss is Heads/Tails, add it to the list of all the previous coin tosses with length i and thus be able to say which outcome in i+1th model has been realized.
And, of course, if you want to compare different assumptions about a coin you will have to track even more models in your mind.
(EDIT: Rereading the post, it seems you’ve adressed this part: if I understand correctly, one can influence their event space by way of focusing on specific outcomes?
Yes, your edit is correct. We can change what we are paying attention to and thus observe different events, which mathematically can be described as having different event spaces. There are some potential issues here, like whether your really made yourself pay attention only to the specific combination you’ve selected and thus are not surprised at all by ten Heads in a row, or are you just adding a new combination to the list of specific combinations which includes all Heads and all Tails thus becoming only about 50% less surprised when observing all Heads.
But this doesn’t matter much in the realm of decision making. If you want to do some action with only 1/2^n probability you can commit to a specific outcome with length n, toss a coin n times and do the action only if this particular outcome is realized.
Wouldn’t it be much simpler to say that in 1, your previous assumption that the coinflips are independent from what you write on a paper became too low probability after observing the coinflips and that caused the feeling of surprise?
Strictly speaking, no, because now you have to add the whole new level of multiple alternative hypothesis with their own probability spaces you are also tracking in your mind and prioritizing between them.
I have a simple rule: “surprise is proportional to the improbability of the event observed” and then use already existent difference between events and outcomes, to explain why observing every outcome of a random number generator is not surprising.
You add an extra distinction between “observed events” and “assumption invalidating observed events”. And I don’t see what it bring to the table. Seems to be a clear case of an extra entity. You can just reduce three entities model (assumption invalidating events, events, outcomes) to two entities (events, outcomes) model, without loosing anything.
It’s not true that any time I observe a low-probability event, one of my assumptions gets low-prob. For example, if I observe HHTHTTHHTTHT, no assumption of mine does, because I didn’t have a previous assumption that I will get coinflips different from HHTHTTHHTTHT.
If you didn’t have an assumption that observing HHTHTTHHTTHT is improbable then in what sense did you observe an improbable event when you saw the outcome HHTHTTHHTTHT?
Your assumptions can be described as a probability space with less rich sigma-algebra in which outcome HHTHTTHHTTHT isn’t an event in itself. Let’s call it model A. Observing an improbable event in model A equals your assumption becoming improbable and vice versa.
On the other hand, you are also trying to keep a probability space with a power set in your mind as well. And there {HHTHTTHHTTHT} is an event with low probability. This is model B.
What you are saying is that if you observed an outcome that corresponds to a low probable event in model B, it doesn’t mean that you’ve observed a low probable event in model A. And I completely agree. What I’m saying, is that you do not need to talk about model B in the first place, as it doesn’t actually correspond to to what you are able to observe and just adds extra confusion.
To me your explanation leaves some things unexplained: for example: In what situation will our human psychology use which non-maximal event spaces? What is the evolutionary reason for this quirk? Isn’t being surprised in the all heads case rational in an objective sense? Should we expect an alien species to be or not be surprised?
Naturally, it depends on our assumptions, what we are paying attention to. A person who is tracking a specific outcome and sees it being realized observes a much less probable event than a person who is tracking a dozen different outcomes, this one included.
There are some built in intuitions about what feels more or less random and its possible to speculate about their evolutionary reasons for them and for our ability to modify what we are paying attention to. There are, indeed, more things to be said on these topics. But they are besides the point of what I wanted to communicate in this post—probability theory and one of its apparent paradoxes which is quite relevant to anthropic reasoning which I’m trying to solve. The idea that our brain is a pattern seeking machine is already quite popular and I doubt that I have much new to add here.
Yes, rather than resolving the surprise of “the exact sequence HHTHTTHTTH” by declaring that it shouldn’t be part of the set of events, I would prefer to resolve it via something like:
It should be part of the set of events I’m allowed to consider just like any other subset of all 10-flip sequences.
We do observe events (or outcomes that if constructed as singleton events) all the time that would we would have predicted to be exceedingly improbable (while they may be improbable individually, a union of them may not be).
Observing some particular unlikely event like “the exact sequence HHTHTTHTTH occurs” should in fact raise my relative belief in any hypothesis by a large factor if that hypothesis would have uniquely predicted that to occur, as compared to others that would have made a far more non-specific prediction. (up to a factor of at most 2^10 unless the other hypothesis considered that sequence to be unlikelier than uniform)
Even if all this is true, I still do not and should not feel surprised in such a case because I think surprise has more to do the amount by which something shifts the beliefs I have that my brain intuits to be important for various reasons. It has little to do with the likelihood of events I observe, other than how it affects those beliefs. I didn’t have any prior reason to assign any meaningful weight to hypotheses about the coin that would predict that exact sequence and no others, such that even after scaling them by a large factor, my overall beliefs about the coin and the distribution of likely future flips should remain very similar to before, therefore I feel little surprise.
By contrast I might feel a little more surprise seeing “HHHHHHHHHH”. And again the reason is not really because of the likelihood or unlikelihood of that sequence, and it also has little to do with which sequences I’m being told I can define to be a mathematical event or not. Rather I think it’s closer to something like “this coin is biased heads” or “this coin always flips heads” are competing hypotheses to “this coin is fair” that while initially extremely unlikely would not be outlandish to consider, and if true it would affect my conception of the coin and predictions of its future flips. So this time the large relative boost would come closer to shifting my beliefs in a way that would impact how I think about the coin and make future predictions, therefore I feel more surprise.
I don’t know.. not using the whole powerset when Ω is finite kinda rubs me the wrong way. (EDIT: correction: what clashes with my aesthetic sense isn’t that it’s not the whole powerset, rather that I instinctively want to have random variables denoting any coinflip when presented with a list of coinflips yet I can’t have that if the set of events is not the powerset because in that case those wouldn’t be measurable functions. I think the following expands on this same intuition without the measure-theoretic formalism.)
Consider the situation where I’m flipping the coin and I keep getting heads, I imagine I get more and more surprised as I’m flipping.
Consider now that I am at the moment when I’ve already flipped n coins, but before flipping the (n+1)th one. I’m thinking about the next flip: To model the situation in my mind, there clearly should be an event where the (n+1)th coin is heads and another event where the (n+1)th coin is tails. Furthermore, these events should have equal (possibly conditional) probabilities yet I will be much more surprised if I get heads again.
This makes me think that the key isn’t that I didn’t actually observe a low probability event (because in my opinion it does not make sense to model the situation above with a σ-algebra where the (n+1)th coin being tails is grouped with the (n+1)th coin being heads because in that case I wouldn’t be able to calculate separate probabilities for those events) rather the key is that I feel surprise when one of my assumptions about the world has become too improbable compared to an alternative: in this case, the assumption that the coin is unbiased. After observing lots of heads the probability that the coin is biased in favor of heads gets much greater than that of it being unbiased, even if we started out with a high prior that it’s unbiased.
Yes, this is the right view.
In real life we never know for sure that coin tosses are independent and unbiased. If we flip a coin 50 times and get 50 heads, we are not actually surprised at the level of an event with 1 in 2 to the −50 probability (about 1 in 10 to the −15). We are instead surprised at the level of our subjective probability that the coin is grossly biased (for example, it might have a head on both sides), which is likely much greater than that.
But in any case, it is not rare for rare events to occur, for the simple reason that the total probability of a set of mutually-exclusive rare events need not be low. That is the case with 50 coin tosses that we do assume are unbiased and independent. Any given result is very rare, but of course the total probability for all possible results is one. There’s nothing puzzling about this.
Trying to avoid rare events by choosing a restrictive sigma algebra is not a viable approach. In the sigma algebra for 50 coin tosses, we would surely want to include events for “1st toss is a head”, “2nd toss is a head”, …, “50th toss is a head”, which are all not rare, and are the sort of event one might want to refer to in practice. But sigma algebras are closed under complement and intersection, so if these events are in the sigma algebra, then so are all the events like “1st toss is a head, 2nd toss is a tail, 3rd toss is a head, …, 50th toss is a tail”, which all have probability 1 in 20 to the −50.
No problem, you just explicitly use two different mathematical models at the same time, modelling different aspects of your problem. One for the whole series of the coin tosses and the other for the i-th coin toss.
Ω={HH,TT,HT,TH}, F={∅,{HT,TH},{HH,TT},{HH,TT,HT,TH}}
Ωi={H,T}, Fi={∅,{H},{T},{H,T}}
Notice, that using a powerset
F={∅,{HT},{TH},{HT,TH},{HH,TT},{HH,TT,HT,TH}}
doesn’t allow you to express individual coin tosses anyway—you need a different sample space for it.
Likewise, consider situations where:
You’ve written a specific non-trivial long combination of Heads and Tails and then, as you flip a coin, this particular combination is being produced. All the same logic for n-th flip. You are not much surprised to see every individual coin toss outcome, but are more and more surprised that they end up into a specific sequence that you’ve written beforehand.
Same as 1. but you’ve written a different combination of Heads and Tails and thus you are neither surprised to see every individual outcome, nor the total result.
Same as 1. but you haven’t written any combination in advance. Once again you are not surprised.
In 1. you’ve observed a rare event and are surprised because of it. In 2. and 3. you didn’t and thus you are not. Even if you’ve observed the same outcome - sequence of Heads and Tails—in all 1. 2. and 3. The events that you’ve observed are quite different. And if you do a simple sanity check, you will notice that, indeed, it’s very easy to replicate situations 2. and 3. but very hard to replicate 1.
Situation 1. is similar to observing many Heads in a row. The difference is that your brain is wired to track for many Heads/many Tails by default, but you being able to track the specific non-trivial combination requires an active precommitment.
This is not an alternative explanation. This is restating the same fact in different terms. If your assumption about the world has become too improbable it means that you’ve accumulated enough evidence against this assumption. Strength of the evidence against an assumption is literally how improbable encountering such event according to this assumption is. It’s not one way or the other. It’s always both.
As I understand (but correct me if I am wrong), your claim is that we don’t feel surprise when observing what is commonly thought of as a rare event, because we don’t actually observe a rare event, because of one quirk of our human psychology we implicitly use a non-maximal event space. But you now seem to allow for another probability space which, if true, seems to me a somewhat inelegant part of the theory. Do you claim that our subconscious tracks events in multiple ways simultaneously or am I misunderstanding you?
Relatedly, the power set does allow me to express individual coin tosses. Let X1 be the following function on Ω:
X1(ω)={1 if ω∈{HH,HT}0 otherwise
In this case X1 is measurable, because X−11[{1}]={HH,HT}∈P(Ω) (minor point: Your F is not the powerset of Ω), same for X−11[{0}]. Therefore X1 is actually a random variable modeling that the first throw is head.
Regarding your examples, I’m not sure I’m understanding you: Is your claim that the eventspace is different in the three cases leading to different probabilities for the events observed? I thought your theory said that our human psychology works with non-maximal eventspaces, but it seems it also works with different event spaces in different situations? (EDIT: Rereading the post, it seems you’ve adressed this part: if I understand correctly, one can influence their event space by way of focusing on specific outcomes?)
Wouldn’t it be much simpler to say that in 1, your previous assumption that the coinflips are independent from what you write on a paper became too low probability after observing the coinflips and that caused the feeling of surprise?
I’m afraid I don’t understand your last paragraph, to me it clearly seems an alternative explanation. Please, elaborate. It’s not true that any time I observe a low-probability event, one of my assumptions gets low-prob. For example, if I observe HHTHTTHHTTHT, no assumption of mine does, because I didn’t have a previous assumption that I will get coinflips different from HHTHTTHHTTHT. An assumption is not just any statement\proposition\event, it’s a belief about the world which is actually assumed beforehand.
To me your explanation leaves some things unexplained: for example: In what situation will our human psychology use which non-maximal event spaces? What is the evolutionary reason for this quirk? Isn’t being surprised in the all heads case rational in an objective sense? Should we expect an alien species to be or not be surprised?
For my proposed explanation these are easy questions to answer: We are not surprised because of the non-maximal event spaces, rather, we are surprised if one of our assumptions loses a lot of probability. The evolutionary reason is that the feeling of surprise caused us to investigate and in cases when one of our assumptions got too improbable, we should actually investigate the alternatives. Yes, being surprised in these cases is objectively rational and we should expect an alien species to do the same on all-heads throw and not do the same on some random string of H/T.
Yes, this is correct. The general principle is “You can observe only what you are paying attention to”. And human quirk is by default paying attention to many Heads/Tails in a row.
It’s not I who allow stuff. The point is that there is nothing in probability theory that forbids us from doing it. It’s not some new radical idea either. Solomonoff inductor is supposed to track all the models at the same time, for instance.
Another point is that, in fact, our minds (not necessary subconscious) are indeed able to use multiple mathematical models at the same time, as long as the sample spaces are different. This is an empirical claim, which you may check yourself.
The question of elegance is less important to me, it’s a matter of taste, essentially. Personally, I think using two different models for their specific tasks and nothing else is a more elegant design than trying to stick all the required functionality and then some more into one bigger model. Actually I still don’t think that just one model would be enough for what you want it to do, anyway.
Yes, you are completely correct, I forgot to add triplets and a couple of pairs. Anyway, let’s explore this kind of modeling:
So suppose you make a series of n coin tosses, your sample space are all possible combinations of Heads and Tails with length n and event space is its powers set. Let’s define event Hi as a set of all possible combinations of Heads and Tails with length n where Heads is in i-th place. You toss a coin the first time and get Heads. Has the event H1 just happened?
No, because H1 is realized only when either of its outcomes is realized and it’s outcomes are series of coin tosses with length n. So you can only say that H1 happened after all the coin tossing is done.
So if you want to update in process, you need a model for the i-th coin toss—which sample space are all possible combinations of Heads and Tails with length i and event space is its powers set. And then with every coin toss this model changes. So in the end you will have n different models.
Also I think you will have to use model for the current coin toss result anyway so that the switch from i to i+1 can properly be implemented. Maybe there is some clever way around this problem. In any case, human minds seem to be working the obvious way: notice that the outcome of the current coin toss is Heads/Tails, add it to the list of all the previous coin tosses with length i and thus be able to say which outcome in i+1th model has been realized.
And, of course, if you want to compare different assumptions about a coin you will have to track even more models in your mind.
Yes, your edit is correct. We can change what we are paying attention to and thus observe different events, which mathematically can be described as having different event spaces. There are some potential issues here, like whether your really made yourself pay attention only to the specific combination you’ve selected and thus are not surprised at all by ten Heads in a row, or are you just adding a new combination to the list of specific combinations which includes all Heads and all Tails thus becoming only about 50% less surprised when observing all Heads.
But this doesn’t matter much in the realm of decision making. If you want to do some action with only 1/2^n probability you can commit to a specific outcome with length n, toss a coin n times and do the action only if this particular outcome is realized.
Strictly speaking, no, because now you have to add the whole new level of multiple alternative hypothesis with their own probability spaces you are also tracking in your mind and prioritizing between them.
I have a simple rule: “surprise is proportional to the improbability of the event observed” and then use already existent difference between events and outcomes, to explain why observing every outcome of a random number generator is not surprising.
You add an extra distinction between “observed events” and “assumption invalidating observed events”. And I don’t see what it bring to the table. Seems to be a clear case of an extra entity. You can just reduce three entities model (assumption invalidating events, events, outcomes) to two entities (events, outcomes) model, without loosing anything.
If you didn’t have an assumption that observing HHTHTTHHTTHT is improbable then in what sense did you observe an improbable event when you saw the outcome HHTHTTHHTTHT?
Your assumptions can be described as a probability space with less rich sigma-algebra in which outcome HHTHTTHHTTHT isn’t an event in itself. Let’s call it model A. Observing an improbable event in model A equals your assumption becoming improbable and vice versa.
On the other hand, you are also trying to keep a probability space with a power set in your mind as well. And there {HHTHTTHHTTHT} is an event with low probability. This is model B.
What you are saying is that if you observed an outcome that corresponds to a low probable event in model B, it doesn’t mean that you’ve observed a low probable event in model A. And I completely agree. What I’m saying, is that you do not need to talk about model B in the first place, as it doesn’t actually correspond to to what you are able to observe and just adds extra confusion.
Naturally, it depends on our assumptions, what we are paying attention to. A person who is tracking a specific outcome and sees it being realized observes a much less probable event than a person who is tracking a dozen different outcomes, this one included.
There are some built in intuitions about what feels more or less random and its possible to speculate about their evolutionary reasons for them and for our ability to modify what we are paying attention to. There are, indeed, more things to be said on these topics. But they are besides the point of what I wanted to communicate in this post—probability theory and one of its apparent paradoxes which is quite relevant to anthropic reasoning which I’m trying to solve. The idea that our brain is a pattern seeking machine is already quite popular and I doubt that I have much new to add here.
Yes, rather than resolving the surprise of “the exact sequence HHTHTTHTTH” by declaring that it shouldn’t be part of the set of events, I would prefer to resolve it via something like:
It should be part of the set of events I’m allowed to consider just like any other subset of all 10-flip sequences.
We do observe events (or outcomes that if constructed as singleton events) all the time that would we would have predicted to be exceedingly improbable (while they may be improbable individually, a union of them may not be).
Observing some particular unlikely event like “the exact sequence HHTHTTHTTH occurs” should in fact raise my relative belief in any hypothesis by a large factor if that hypothesis would have uniquely predicted that to occur, as compared to others that would have made a far more non-specific prediction. (up to a factor of at most 2^10 unless the other hypothesis considered that sequence to be unlikelier than uniform)
Even if all this is true, I still do not and should not feel surprised in such a case because I think surprise has more to do the amount by which something shifts the beliefs I have that my brain intuits to be important for various reasons. It has little to do with the likelihood of events I observe, other than how it affects those beliefs. I didn’t have any prior reason to assign any meaningful weight to hypotheses about the coin that would predict that exact sequence and no others, such that even after scaling them by a large factor, my overall beliefs about the coin and the distribution of likely future flips should remain very similar to before, therefore I feel little surprise.
By contrast I might feel a little more surprise seeing “HHHHHHHHHH”. And again the reason is not really because of the likelihood or unlikelihood of that sequence, and it also has little to do with which sequences I’m being told I can define to be a mathematical event or not. Rather I think it’s closer to something like “this coin is biased heads” or “this coin always flips heads” are competing hypotheses to “this coin is fair” that while initially extremely unlikely would not be outlandish to consider, and if true it would affect my conception of the coin and predictions of its future flips. So this time the large relative boost would come closer to shifting my beliefs in a way that would impact how I think about the coin and make future predictions, therefore I feel more surprise.