This is awesome! Thanks for sharing. There are some fields where I want to read related papers, and this is a step up from just going through the citations list. Very cool work, and I like how there is also a list view which is much less cluttered.
I just tried to generate a graph for a friend’s paper on Arxiv, but it told me that the back-end was overloaded, so hopefully it’s working soon.
I have a few general questions about the site:
Are either the front-end or the graphs themselves open-source?
Are the graphs being generated ahead of time or on the fly?
How did you parse through the citation lists for papers from different journals? Even for Arxiv, it seems like there are at least a few different formats for citations.
What are some surprising things you’ve learned from analyzing the graphs you’ve already generated?
How do you determine how many nodes to show on the screen?
Hey, glad to see you like the concept! We’re actively working on improving the performance.
1. Everything is proprietary for now. After consideration we decided that this project is not well suited for open sourcing at this time.
2. Graphs are generated on the fly, but only for the first time. We keep the results in a cache so when another user asks for the same graph later, they’d get it instantly. Also, asking for graphs which are close in paper-space would also run faster.
3. We rely on external sources (like the Open Corpus by Semantic Scholar) for the citations database. Unfortunately, no database is perfect yet and sometimes citations are badly parsed.
4. First, we found this tool very fun for exploring paper-space in new domains. I sometimes just enter a keyword like “psychology” and start exploring. This gives me a nice overview of the type of titles and branches in new (for me) fields of science.
Second, I was surprised with how easy it was to recognize papers that are bridging multiple disciplines. Take a look at our example graph “deepfruits”, for example: there are two obvious clusters. One shows deep learning papers mostly about detection. The other shows papers that describe how these techniques were applied in agriculture.
5. We’ve experimented early on and arrived to a conclusion that more than ~50 papers on the screen is too much clutter, and it’s better to traverse paper-space by building more graphs. Avoiding specifics on purpose :)
One other feature I’d really like is the ability to save the papers (and then export) I find through this tool, which would probably require an account for persistence.
Are there plans for something like this in the works?
I also like the tool and expect to use it at times so thanks for building and sharing.
I have to also share the performance related experience—yesterday I had several attempts return the system overloaded response.
This morning my test looks to be progressing 45% after about 10 minutes. I suspect that your message there might be a bit optimistic. If generating the results is expected to take more than a few minutes allow some form of notification once the graph has been complete.
Sorry for the malfunction you experienced, it probably happened while we were overloaded. We’ve since increased the server count and limited the amount of graphs users can build in parallel.
An insider tip: you only have waiting times for graphs that have never been built before. If you return to the graph you’ve already built, it would be instantaneous.
This is awesome! Thanks for sharing. There are some fields where I want to read related papers, and this is a step up from just going through the citations list. Very cool work, and I like how there is also a list view which is much less cluttered.
I just tried to generate a graph for a friend’s paper on Arxiv, but it told me that the back-end was overloaded, so hopefully it’s working soon.
I have a few general questions about the site:
Are either the front-end or the graphs themselves open-source?
Are the graphs being generated ahead of time or on the fly?
How did you parse through the citation lists for papers from different journals? Even for Arxiv, it seems like there are at least a few different formats for citations.
What are some surprising things you’ve learned from analyzing the graphs you’ve already generated?
How do you determine how many nodes to show on the screen?
Hey, glad to see you like the concept! We’re actively working on improving the performance.
1. Everything is proprietary for now. After consideration we decided that this project is not well suited for open sourcing at this time.
2. Graphs are generated on the fly, but only for the first time. We keep the results in a cache so when another user asks for the same graph later, they’d get it instantly. Also, asking for graphs which are close in paper-space would also run faster.
3. We rely on external sources (like the Open Corpus by Semantic Scholar) for the citations database. Unfortunately, no database is perfect yet and sometimes citations are badly parsed.
4. First, we found this tool very fun for exploring paper-space in new domains. I sometimes just enter a keyword like “psychology” and start exploring. This gives me a nice overview of the type of titles and branches in new (for me) fields of science.
Second, I was surprised with how easy it was to recognize papers that are bridging multiple disciplines. Take a look at our example graph “deepfruits”, for example: there are two obvious clusters. One shows deep learning papers mostly about detection. The other shows papers that describe how these techniques were applied in agriculture.
5. We’ve experimented early on and arrived to a conclusion that more than ~50 papers on the screen is too much clutter, and it’s better to traverse paper-space by building more graphs. Avoiding specifics on purpose :)
Awesome, thanks for the answers!
One other feature I’d really like is the ability to save the papers (and then export) I find through this tool, which would probably require an account for persistence.
Are there plans for something like this in the works?
Yes—these are probably our most requested features and are high in our list of features to add.
I also like the tool and expect to use it at times so thanks for building and sharing.
I have to also share the performance related experience—yesterday I had several attempts return the system overloaded response.
This morning my test looks to be progressing 45% after about 10 minutes. I suspect that your message there might be a bit optimistic. If generating the results is expected to take more than a few minutes allow some form of notification once the graph has been complete.
Hey, glad you like the concept!
Sorry for the malfunction you experienced, it probably happened while we were overloaded. We’ve since increased the server count and limited the amount of graphs users can build in parallel.
An insider tip: you only have waiting times for graphs that have never been built before. If you return to the graph you’ve already built, it would be instantaneous.