I also noticed I was confused. Feels like we’re at least disentangling cases and making better distinctions here. BTW, just realised that a problem with my triangular prism example is that theoretically no willrectangular side can face up parallel to the floor at the same time, just two at 60º angles).
But on the other hand x is not sufficient to spot when we have a new type of die (see previous point) and if we knew more about the dice we could do better estimates which makes me think that it is epistemic uncertainty.
This is interesting. This seems to ask the question ‘Is a change in the quality of x like colour actually causal to outcomes y?’ Difficulty here is that you can never fully be certain empirically, just get closer to [change in roll probability] for [limit number of rolls → infinity] = 0.
Aleatoric uncertainty is from inherent stochasticity and does not reduce with more data.
Epistemic uncertainty is from lack of knowledge and/or data and can be further reduced by improving the model with more knowledge and/or data.
However, I found some useful tidbits
Uncertainties are characterized as epistemic, if the modeler sees a possibility to reduce them by gathering more data or by refining models. Uncertainties are categorized as aleatory if the modeler does not foresee the possibility of reducing them. [Aleatory or epistemic? Does it matter?]
Which sources of uncertainty, variables, or probabilities are labelled epistemic and which are labelled aleatory depends upon the mission of the study. [...] One cannot make the distinction between aleatory and epistemic uncertainties purely through physical properties or the experts’ judgments. The same quantity in one study may be treated as having aleatory uncertainty while in another study the uncertainty maybe treated as epistemic. [Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management]
[E]pistemic uncertainty means not being certain what the relevant probability distribution is, and aleatoric uncertainty means not being certain what a random sample drawn from a probability distribution will be. [Uncertainty quantification]
With this my updated view is that our confusion is probably because there is a free parameter in where to draw the line between aleatoric and epistemic uncertainty.
This seems reasonable as more information can always lead to better estimates (at least down to considering wavefunctions I suppose) but in most cases having this kind of information and using it is infeasible and thus having the distinction between aleatoric and epistemic depend on the problem at hand seems reasonable.
This seems to ask the question ‘Is a change in the quality of x like colour actually causal to outcomes y?’
Yes, I think you are right. Usually when modeling you can learn correlations that are useful for predictions but if the correlations are spurious they might disappear when the distributions changes. As such to know if p(y|x) changes from only observing x, then we would probably need that all causal relationships to y are captured in x?
I also noticed I was confused. Feels like we’re at least disentangling cases and making better distinctions here.
BTW, just realised that a problem with my triangular prism example is that theoretically no will rectangular side can face up parallel to the floor at the same time, just two at 60º angles).
This is interesting. This seems to ask the question ‘Is a change in the quality of x like colour actually causal to outcomes y?’ Difficulty here is that you can never fully be certain empirically, just get closer to [change in roll probability] for [limit number of rolls → infinity] = 0.
To disentangle the confusion I took a look around about a few different definitions of the concepts. The definitions were mostly the same kind of vague statement of the type:
Aleatoric uncertainty is from inherent stochasticity and does not reduce with more data.
Epistemic uncertainty is from lack of knowledge and/or data and can be further reduced by improving the model with more knowledge and/or data.
However, I found some useful tidbits
With this my updated view is that our confusion is probably because there is a free parameter in where to draw the line between aleatoric and epistemic uncertainty.
This seems reasonable as more information can always lead to better estimates (at least down to considering wavefunctions I suppose) but in most cases having this kind of information and using it is infeasible and thus having the distinction between aleatoric and epistemic depend on the problem at hand seems reasonable.
This is clarifying, thank you!
Good catch
Yes, I think you are right. Usually when modeling you can learn correlations that are useful for predictions but if the correlations are spurious they might disappear when the distributions changes. As such to know if p(y|x) changes from only observing x, then we would probably need that all causal relationships to y are captured in x?