jacopo comments on A descriptive, not prescriptive, overview of current AI Alignment Research

jacopo 9 Jun 2022 10:07 UTC
2 points
Cool work!
Can I ask a couple of questions about the DR+clustering approach?
If I understand correctly, you do the clustering in a 2D space obtained with UMAP (ignore this if I am wrong). Are you sure you are not losing important information with such a low dimension? I say this because you show that one dimension is strongly correlated with style (academic vs forum/blog) and the second may be somewhat correlated with time. I remember that an argument exists for using n-1 dimensions when looking for n clusters, although that was probably using linear DR techniques and might not apply to UMAP. But it would be interesting to check if using higher n_components (3 to 5) results in the same clustering or generates some new insight.
Another thing you could check is using GMM instead of k-means. My (limited) experience is that if the embedding dimension is low you get better results this way. But, again, I was clustering downstream of linear DR.
- Jan 9 Jun 2022 12:15 UTC
  3 points
  Parent
  Thank you for the comment and the questions! :)
  This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping—that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.
  - jacopo 9 Jun 2022 16:10 UTC
    1 point
    Parent
    Ahh sorry! Going back to read it was pretty clear from the text. I was tricked by the figure where the embedding is presented first. Again, good job! :)