Is the LW database structure available? If yes, you could prepare some SELECT queries and ask admins to run them for you and send you the result.
Anonymization: Replace user ids with “f(id+c)” where “f” is a hash function and “c” is a constant that will be modified by the admin before running you script. Replace times of karma clicks with “ym(time+r)” where “r” is a random value between 0 and 30 days, and “ym” is a function that returns only month and year. Select only data from recent year and only from users who are were active during the whole year (made at least one vote in the first and last months of the time period). Would such data be still useful to you?
My day job is DB admin and development. In the unlikely event of LW back-end admin-types being comfortable running a query sent in by some dude off the site, I wouldn’t be comfortable giving it to them. The effort of due diligence on a foreign script is probably greater than that required to put it together.
The data I want correspond to:
the IDs (i.e. primary key, not the username) of all the users
the IDs (PK) and authorship (user ID) of all posts and comments in a contiguous ~3 month period
the adjacency of users and posts as upvotes and downvotes over this period (I assume this is a single junction table)
If I were providing this data, I would also scramble the IDs in some fashion while maintaining the underlying relationships, as consecutive IDs could provide some small clue as to the identity and chronology of users or posts. While this is pretty straightforward, the mechanism for such scrambling should not be known to recipients of the data.
Is the LW database structure available? If yes, you could prepare some SELECT queries and ask admins to run them for you and send you the result.
Anonymization: Replace user ids with “f(id+c)” where “f” is a hash function and “c” is a constant that will be modified by the admin before running you script. Replace times of karma clicks with “ym(time+r)” where “r” is a random value between 0 and 30 days, and “ym” is a function that returns only month and year. Select only data from recent year and only from users who are were active during the whole year (made at least one vote in the first and last months of the time period). Would such data be still useful to you?
My day job is DB admin and development. In the unlikely event of LW back-end admin-types being comfortable running a query sent in by some dude off the site, I wouldn’t be comfortable giving it to them. The effort of due diligence on a foreign script is probably greater than that required to put it together.
The data I want correspond to:
the IDs (i.e. primary key, not the username) of all the users
the IDs (PK) and authorship (user ID) of all posts and comments in a contiguous ~3 month period
the adjacency of users and posts as upvotes and downvotes over this period (I assume this is a single junction table)
If I were providing this data, I would also scramble the IDs in some fashion while maintaining the underlying relationships, as consecutive IDs could provide some small clue as to the identity and chronology of users or posts. While this is pretty straightforward, the mechanism for such scrambling should not be known to recipients of the data.