I need a principled approach to dealing with truncated, mixed, and long tailed distributions.
I don’t know about principled, but here are some of my heuristics:
no real life distributions are NOT mixtures
if there’s a more natural scale to view data in, try viewing it in that scale. A scale is natural if linear operations seem to work in that scale.
if you have a hypothesis for an underlying generation process even if you think it’s only one part of the picture, subtract out the resulting theoretical distribution from your observed data and look for remaining strong patterns in the residuals
noise is noise, but a persistent, consistent-directionality anomaly (“bump”) persisting after reasonable binning is probably not noise
I don’t know about principled, but here are some of my heuristics:
no real life distributions are NOT mixtures
if there’s a more natural scale to view data in, try viewing it in that scale. A scale is natural if linear operations seem to work in that scale.
if you have a hypothesis for an underlying generation process even if you think it’s only one part of the picture, subtract out the resulting theoretical distribution from your observed data and look for remaining strong patterns in the residuals
noise is noise, but a persistent, consistent-directionality anomaly (“bump”) persisting after reasonable binning is probably not noise
Roll 1d8 for the first digit (treating 8 as 0), roll 1d10 for the second; same principle as the commonly-used d100.
Literally laughed out loud at that.
I continue to love reading these. Looking forward to the next one!