I’ve knocked out something quickly. I’ve got no idea how fast it will be over the ~250000 comments (there are probably some performance improvements by replacing “for … in cursor” with a paged retrieve).
I believe that that will only keep public posts (so no drafts or deleted posts), I’m not so sure about the comments though (I’m not sure if comments on deleted articles are kept or not, or if there is such a thing as a “private” comment that I’m not filtering properly).
That script is a “best case” situation, since it records the origin along with the target of each link (and the date/karma too). If that data was to be published, I’ll do try some analysis (and maybe even a proper article!).
Thanks! (I’m not getting my hopes up for it.)
I’ve knocked out something quickly. I’ve got no idea how fast it will be over the ~250000 comments (there are probably some performance improvements by replacing “for … in cursor” with a paged retrieve).
I believe that that will only keep public posts (so no drafts or deleted posts), I’m not so sure about the comments though (I’m not sure if comments on deleted articles are kept or not, or if there is such a thing as a “private” comment that I’m not filtering properly).
That script is a “best case” situation, since it records the origin along with the target of each link (and the date/karma too). If that data was to be published, I’ll do try some analysis (and maybe even a proper article!).