Very cool! Could you share your code at all? I’d love to explore this a little.
I adore the broccoli tree. I would be very happy to convert the dataset you used to make those pngs into an interactive network visualization and share it with you as an index.html. It would take all of an hour.
I do kind of agree with the other comments that, having noticed something, finding more of that stuff in that area is not so surprising. I think it would be good to get more context and explore the region more before concluding that that particular set of generations is significant.
However, I do think there is something to the mans penis. It’s interesting that it collapses so quickly to something so specific in that particular branch. Not sure if I have any other comments on it for now though.
Hopefully this makes sense. You specify a token or non-token embedding and one script generates a .json file with nested tree structure. Another script then renders that as a PNG. You just need to first have loaded GPT-J’s model, embeddings tensor and tokenizer, and specify a save directory. Let me know if you have any trouble with this.
Very cool! Could you share your code at all? I’d love to explore this a little.
I adore the broccoli tree. I would be very happy to convert the dataset you used to make those pngs into an interactive network visualization and share it with you as an index.html. It would take all of an hour.
I do kind of agree with the other comments that, having noticed something, finding more of that stuff in that area is not so surprising. I think it would be good to get more context and explore the region more before concluding that that particular set of generations is significant.
However, I do think there is something to the mans penis. It’s interesting that it collapses so quickly to something so specific in that particular branch. Not sure if I have any other comments on it for now though.
This is the right kind of cartography for 2024.
More of those definition trees can be seen in this appendix to my last post:
https://www.lesswrong.com/posts/hincdPwgBTfdnBzFf/mapping-the-semantic-void-ii-above-below-and-between-token#Appendix_A__Dive_ascent_data
I’ve thrown together a repo here (from some messy Colab sheets):
https://github.com/mwatkins1970/GPT_definition_trees
Hopefully this makes sense. You specify a token or non-token embedding and one script generates a .json file with nested tree structure. Another script then renders that as a PNG. You just need to first have loaded GPT-J’s model, embeddings tensor and tokenizer, and specify a save directory. Let me know if you have any trouble with this.