I have similar but more geometric way of thinking about it. I think of the distribution of properties as a topography of many mountains and valleys. Then we get hierarchical clustering as mountains with multiple tops, and for each cluster we get the structure of a lower dimensional manifold by looking only at the directions for which the mountain is relatively wide and flat.
Of course, the underlying geometry and as a result the distribution density are themselves subjective and dependant on what we care about—pixel-by-pixel or atom-by-atom comparison would not yield similarity between trees even of the same species
I have similar but more geometric way of thinking about it. I think of the distribution of properties as a topography of many mountains and valleys. Then we get hierarchical clustering as mountains with multiple tops, and for each cluster we get the structure of a lower dimensional manifold by looking only at the directions for which the mountain is relatively wide and flat.
Of course, the underlying geometry and as a result the distribution density are themselves subjective and dependant on what we care about—pixel-by-pixel or atom-by-atom comparison would not yield similarity between trees even of the same species