Hopefully this makes sense. You specify a token or non-token embedding and one script generates a .json file with nested tree structure. Another script then renders that as a PNG. You just need to first have loaded GPT-J’s model, embeddings tensor and tokenizer, and specify a save directory. Let me know if you have any trouble with this.
More of those definition trees can be seen in this appendix to my last post:
https://www.lesswrong.com/posts/hincdPwgBTfdnBzFf/mapping-the-semantic-void-ii-above-below-and-between-token#Appendix_A__Dive_ascent_data
I’ve thrown together a repo here (from some messy Colab sheets):
https://github.com/mwatkins1970/GPT_definition_trees
Hopefully this makes sense. You specify a token or non-token embedding and one script generates a .json file with nested tree structure. Another script then renders that as a PNG. You just need to first have loaded GPT-J’s model, embeddings tensor and tokenizer, and specify a save directory. Let me know if you have any trouble with this.