Nathan Helm-Burger comments on HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix

Nathan Helm-Burger 19 Oct 2024 16:39 UTC
2 points
0
I have used HDBSCAN in a variety of instances in my data science career. The noise-aware aspect is definitely a mixed blessing. Often I find the best results come from using a variety of clustering algorithms, and figuring out how to do an ensemble of the results (e.g. treating the output of each clustering algorithm as a dimension in a similarity vector). Did you experiment with other clustering algorithms also?

Additionally, UMAP is outdated, please use PaCMAP instead: https://www.lesswrong.com/posts/C8LZ3DW697xcpPaqC/the-geometry-of-feelings-and-nonsense-in-large-language?commentId=Deddnyr7zJMwmNLBS
- Jaehyuk Lim 21 Oct 2024 13:47 UTC
  3 points
  0
  Parent
  Hey, thanks for the reply. Yes, we tried k-means and agglomerative clustering and they worked with some mixed results.
  We’ll try PaCMAP instead and see if it is better!