Very very helpful! The clustering is obviously a function of the corpus. From your narrative, it seems like you only added the missing arx.iv files after clustering. Is it possible the clusters would look different with those in?
Hey Ben! :) Thanks for the comment and the careful reading!
Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that’s pretty neat (especially since the dimensionality reduction doesn’t “know” about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.
Very very helpful! The clustering is obviously a function of the corpus. From your narrative, it seems like you only added the missing arx.iv files after clustering. Is it possible the clusters would look different with those in?
Hey Ben! :) Thanks for the comment and the careful reading!
Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that’s pretty neat (especially since the dimensionality reduction doesn’t “know” about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.