I need a principled approach to dealing with truncated, mixed, and long tailed distributions.
I don’t know about principled, but here are some of my heuristics:
no real life distributions are NOT mixtures
if there’s a more natural scale to view data in, try viewing it in that scale. A scale is natural if linear operations seem to work in that scale.
if you have a hypothesis for an underlying generation process even if you think it’s only one part of the picture, subtract out the resulting theoretical distribution from your observed data and look for remaining strong patterns in the residuals
noise is noise, but a persistent, consistent-directionality anomaly (“bump”) persisting after reasonable binning is probably not noise
I don’t know about principled, but here are some of my heuristics:
no real life distributions are NOT mixtures
if there’s a more natural scale to view data in, try viewing it in that scale. A scale is natural if linear operations seem to work in that scale.
if you have a hypothesis for an underlying generation process even if you think it’s only one part of the picture, subtract out the resulting theoretical distribution from your observed data and look for remaining strong patterns in the residuals
noise is noise, but a persistent, consistent-directionality anomaly (“bump”) persisting after reasonable binning is probably not noise