why do you hate science and the future of humanity
Because we promised to respect the participants’ privacy. That includes (e.g.) not posting their income on the internet alongside other information that might be used to identify them.
Our current plan is to share the data with a few stats folks who also agree to protect their privacy. I’ve exchanged emails with Ilya about this, and we’re looking for others.
The LW survey has a field where a respondant can choose whether he likes to have his responses freely available. I would encourage you to have such a field in future studies.
That has the added benefit that if someone external datamines the data and comes to a conclusion that a specific pattern exists you can check whether the pattern also exists with the people who choose to have their data private.
If salary is your main worry, why not transform it into a rank ordering? That erases specific salary numbers while still preserving enough information to run a lot of tests (for example, a lot of nonparametrics uses rank-ordering).
(As I know you’re well-aware,) there have been plenty of demonstrations of researchers managing to de-anonymize even supposedly anonymous datasets. Enough demonstrations that if I turn over personal information to any organization and they imply that they’ll treat it as confidential (and CFAR certainly did), then I would consider even anonymized releases of that information as a mild breach of confidence unless they specifically warned me about the possibility of this when I was giving them the data.
Because we promised to respect the participants’ privacy. That includes (e.g.) not posting their income on the internet alongside other information that might be used to identify them.
Our current plan is to share the data with a few stats folks who also agree to protect their privacy. I’ve exchanged emails with Ilya about this, and we’re looking for others.
The LW survey has a field where a respondant can choose whether he likes to have his responses freely available. I would encourage you to have such a field in future studies.
That has the added benefit that if someone external datamines the data and comes to a conclusion that a specific pattern exists you can check whether the pattern also exists with the people who choose to have their data private.
If salary is your main worry, why not transform it into a rank ordering? That erases specific salary numbers while still preserving enough information to run a lot of tests (for example, a lot of nonparametrics uses rank-ordering).
(As I know you’re well-aware,) there have been plenty of demonstrations of researchers managing to de-anonymize even supposedly anonymous datasets. Enough demonstrations that if I turn over personal information to any organization and they imply that they’ll treat it as confidential (and CFAR certainly did), then I would consider even anonymized releases of that information as a mild breach of confidence unless they specifically warned me about the possibility of this when I was giving them the data.