there is a 1⁄29 chance the second item will be next to it.
‘Next to it’, perhaps, but wouldn’t that other alternative be putting it on an entirely different branch and so less similar as it’s not in the same cluster?movie-fearandloathing may be ‘next to’ fanfiction-remiscent-afterthought-threecharacters in the clustering, but not nearly as similar to it as movie-1492conquestparadise… so I think that analysis is less right than my own simple one.
By “next to it” I meant paired with it, sorry. Not all items have another item paired with them, which is where the correction factor of 2⁄3 comes from.
Not all items have another item paired with them, which is where the correction factor of 2⁄3 comes from.
Ah, I see. I’m not sure how I should deal with the non-pairing or multiple node groups; I didn’t take them into account in advance, and anything based on observing the tree that was generated feels ad hoc. So if the odds of the pairing given random chance is overestimated, that means the strength of the pairing is being underestimated, right, and the likelihood ratio is weaker than it ‘should’ be? I’m fine with leaving that alone: as I said, when possible I tried to make conclusions as weak as possible.
What do the pairings even mean, exactly? I would expect two nodes to be paired iff they are closer to each other than to any other node. If this is the case, then under a random-distance model with n nodes the probability that two specific nodes are paired is 1/(2n-3).
‘Next to it’, perhaps, but wouldn’t that other alternative be putting it on an entirely different branch and so less similar as it’s not in the same cluster?
movie-fearandloathing
may be ‘next to’fanfiction-remiscent-afterthought-threecharacters
in the clustering, but not nearly as similar to it asmovie-1492conquestparadise
… so I think that analysis is less right than my own simple one.By “next to it” I meant paired with it, sorry. Not all items have another item paired with them, which is where the correction factor of 2⁄3 comes from.
Ah, I see. I’m not sure how I should deal with the non-pairing or multiple node groups; I didn’t take them into account in advance, and anything based on observing the tree that was generated feels ad hoc. So if the odds of the pairing given random chance is overestimated, that means the strength of the pairing is being underestimated, right, and the likelihood ratio is weaker than it ‘should’ be? I’m fine with leaving that alone: as I said, when possible I tried to make conclusions as weak as possible.
What do the pairings even mean, exactly? I would expect two nodes to be paired iff they are closer to each other than to any other node. If this is the case, then under a random-distance model with n nodes the probability that two specific nodes are paired is 1/(2n-3).
As far as I know, it means that they are closer, yes.