Or, if not that, just the most linked to posts. (It’d be cool if one of the Trike Apps people could run a regex (e.g. “lesswrong.com/[^ ]*”) over the database (reddit_data_comment[key=data] and reddit_data_article[key=article], I believe) and publish a .txt dump of that somewhere.)
I’ll be in the TrikeApps office about a week from now; I’ll do my best to remember this and have something workable ready to offer to them; can’t promise they’ll be excited about data-mining LessWrong though.
I’ve knocked out something quickly. I’ve got no idea how fast it will be over the ~250000 comments (there are probably some performance improvements by replacing “for … in cursor” with a paged retrieve).
I believe that that will only keep public posts (so no drafts or deleted posts), I’m not so sure about the comments though (I’m not sure if comments on deleted articles are kept or not, or if there is such a thing as a “private” comment that I’m not filtering properly).
That script is a “best case” situation, since it records the origin along with the target of each link (and the date/karma too). If that data was to be published, I’ll do try some analysis (and maybe even a proper article!).
I’d be interested in PageRank on LW.
Or, if not that, just the most linked to posts. (It’d be cool if one of the Trike Apps people could run a regex (e.g. “lesswrong.com/[^ ]*”) over the database (reddit_data_comment[key=data] and reddit_data_article[key=article], I believe) and publish a .txt dump of that somewhere.)
I’ll be in the TrikeApps office about a week from now; I’ll do my best to remember this and have something workable ready to offer to them; can’t promise they’ll be excited about data-mining LessWrong though.
Thanks! (I’m not getting my hopes up for it.)
I’ve knocked out something quickly. I’ve got no idea how fast it will be over the ~250000 comments (there are probably some performance improvements by replacing “for … in cursor” with a paged retrieve).
I believe that that will only keep public posts (so no drafts or deleted posts), I’m not so sure about the comments though (I’m not sure if comments on deleted articles are kept or not, or if there is such a thing as a “private” comment that I’m not filtering properly).
That script is a “best case” situation, since it records the origin along with the target of each link (and the date/karma too). If that data was to be published, I’ll do try some analysis (and maybe even a proper article!).