If you plan to release the individual answers as you did last time, please keep in mind that karma alone is sufficient to identify a lot of people, so removing other identifying information makes more sense if you also round the karma (e.g. to nearest power of 10 or 5 or some other number).
You could do this when generating the xls file, or you could give karma ranges as options in the survey. If you do the former, some (small number of) people will lie about their karma to prevent you from identifying them.
As long as you mean “round to the nearest in this list”, sure.
But if you mean “round 8838 to 8850”, the number of people per ‘option’ gets too low in the high karmas. Look at the top ten disclosed karmas from the last survey: 7500, 7830, 8838, 9000, 12000, 14000, 14612, 18000, 26084, 48000.
In fact, everyone over 10000 should probably be lumped together just to account for Eliezer (so that he isn’t alone in his category). He didn’t disclose his karma last time, but I’m strongly in favor of a system that works regardless of the users’ carefulness.
Edit: here used to be a paragraph about how a specific LW user of interest could easily be identified in last survey’s data. I apologize for invading his or her privacy in my thoughtless irritation.
I’m just a bit touchy about privacy-related procedures.)
If you’re touchy about privacy issues, the way to express that is NOT to out someone’s anonymous survey answers. That is anti-social behavior, and implies that you are only interested in your OWN privacy while not at all valuing the privacy of others.
If you wanted to show how easy it was to find out someone’s identity from the survey answers, the better course of action would have been to put in a comment something like “in fact, from last year’s survey I was able to figure out the identity of at least one person using karma score as the main indicator”, and then to PM Yvain personally with the information, since he could tighten security unilaterally. It is NOT acceptable to post publicly the identity of the person whose identity you discovered.
I suggest you retract your comment, and ask a mod to delete it—especially if you are as touchy about privacy procedures as you claim to be.
If you’re touchy about privacy issues, the way to express that is NOT to out someone’s anonymous survey answers.
Sure it is, if it is going to work and the expected benefits outweight the perceived costs. Demonstrating that information is ALREADY out there for those who care enough to look is sometimes going to be the only way to apply enough pressure to see things changed rather than swept under the rug. The aforementioned “costs” include costs to the speaker for rocking the boat.
That is anti-social behavior,
Sometimes it could be. That would depend on the circumstances and whether the person applying that judgement happened to value actual future privacy within said social group more than perception of past privacy
in the same. Even when it is not actually anti-social it can still be judged ‘uncouth’ and disruptive and at best they can expect to be blamed for being the messenger.
and implies that you are only interested in your OWN privacy while not at all valuing the privacy of others.
That doesn’t follow. Someone who cared only about increasing the privacy of others while not caring at all about their own (and who was equally ruthless in their approach) would take the same action. In fact making that kind of statement outright implies altruistic interests rather than selfish ones. The selfish privacy concerned individual just wouldn’t bother drawing attention to themselves as someone whose privacy is worth breaching and would simply not participate in the survey. Speaking up can only serve to help others who more naive about the privacy concerns.
I’ve removed that paragraph and I apologize for it.
If I may indulge in a bit of nitpicking, you misquoted me: “privacy-related procedures” is very different from “privacy issues”, and I maintain that my touchiness is consistent. It is a valid position that the information leak already happened with the publication of the file (so Yvain cannot tighten security when it comes to that file), and that drawing attention to specific breaches of privacy is generally the best way to force people to think about privacy. But your position is valid too, and it was stupid of me to act as I did in a place full of people sharing your position. (Extra stupidity points for me since the place is heavily moderated.)
Ranges would work. 1000+ should be high enough for the top category; on last year’s survey only 9% of respondents (80 people) were in that range. On CFAR surveys we’ve used:
I don’t have a Less Wrong account zero or less 1-99 100-999 1000 or more
Finer categories might be useful and shouldn’t compromise anonymity too much, especially at the low end. This breakdown looks OK to me:
no karma score mentioned (341),
0 or less (144),
1-4 (39),
5-9 (27),
10-19 (38),
20-29 (29),
30-49 (40),
50-99 (52),
100-199 (45),
200-299 (27),
300-499 (30),
500-999 (38),
1000-1999 (37) and
2000+ (43).
Numbers in brackets are the number of responses in each category on the 2011 survey. Note that another survey now would get even more responses in most categories.
(Personally I’m OK with Yvain’s laissez-faire approach of letting people round karma scores themselves to the degree they want. But I can see why using discrete categories to enforce privacy might be more robust.)
[Edited after army1987 posted his comment to clarify the bracketed numbers.]
That looks great, but I’d split the top range into two (because I don’t feel that comfortable in being lumped with EY et al.) and 50-99 and 100-199 (for consistency, so none gets >40 respondents in the last survey).
That’s way too coarse IMO. I’d prefer having a write-in answer field but suggesting people to round it to one or two significant figures (depending on how concerned they are about their privacy), and maybe accepting the answer “> 5000”.
If you plan to release the individual answers as you did last time, please keep in mind that karma alone is sufficient to identify a lot of people, so removing other identifying information makes more sense if you also round the karma (e.g. to nearest power of 10 or 5 or some other number).
You could do this when generating the xls file, or you could give karma ranges as options in the survey. If you do the former, some (small number of) people will lie about their karma to prevent you from identifying them.
A third solution would be to ask everyone to round to the nearest 5, 10, 50 (etc.) when answering.
As long as you mean “round to the nearest in this list”, sure.
But if you mean “round 8838 to 8850”, the number of people per ‘option’ gets too low in the high karmas. Look at the top ten disclosed karmas from the last survey: 7500, 7830, 8838, 9000, 12000, 14000, 14612, 18000, 26084, 48000.
In fact, everyone over 10000 should probably be lumped together just to account for Eliezer (so that he isn’t alone in his category). He didn’t disclose his karma last time, but I’m strongly in favor of a system that works regardless of the users’ carefulness.
Edit: here used to be a paragraph about how a specific LW user of interest could easily be identified in last survey’s data. I apologize for invading his or her privacy in my thoughtless irritation.
If you’re touchy about privacy issues, the way to express that is NOT to out someone’s anonymous survey answers. That is anti-social behavior, and implies that you are only interested in your OWN privacy while not at all valuing the privacy of others.
If you wanted to show how easy it was to find out someone’s identity from the survey answers, the better course of action would have been to put in a comment something like “in fact, from last year’s survey I was able to figure out the identity of at least one person using karma score as the main indicator”, and then to PM Yvain personally with the information, since he could tighten security unilaterally. It is NOT acceptable to post publicly the identity of the person whose identity you discovered.
I suggest you retract your comment, and ask a mod to delete it—especially if you are as touchy about privacy procedures as you claim to be.
Sure it is, if it is going to work and the expected benefits outweight the perceived costs. Demonstrating that information is ALREADY out there for those who care enough to look is sometimes going to be the only way to apply enough pressure to see things changed rather than swept under the rug. The aforementioned “costs” include costs to the speaker for rocking the boat.
Sometimes it could be. That would depend on the circumstances and whether the person applying that judgement happened to value actual future privacy within said social group more than perception of past privacy in the same. Even when it is not actually anti-social it can still be judged ‘uncouth’ and disruptive and at best they can expect to be blamed for being the messenger.
That doesn’t follow. Someone who cared only about increasing the privacy of others while not caring at all about their own (and who was equally ruthless in their approach) would take the same action. In fact making that kind of statement outright implies altruistic interests rather than selfish ones. The selfish privacy concerned individual just wouldn’t bother drawing attention to themselves as someone whose privacy is worth breaching and would simply not participate in the survey. Speaking up can only serve to help others who more naive about the privacy concerns.
False (insinuated) accusation.
I’ve removed that paragraph and I apologize for it.
If I may indulge in a bit of nitpicking, you misquoted me: “privacy-related procedures” is very different from “privacy issues”, and I maintain that my touchiness is consistent. It is a valid position that the information leak already happened with the publication of the file (so Yvain cannot tighten security when it comes to that file), and that drawing attention to specific breaches of privacy is generally the best way to force people to think about privacy. But your position is valid too, and it was stupid of me to act as I did in a place full of people sharing your position. (Extra stupidity points for me since the place is heavily moderated.)
As of now, the tenth top contributor of all times is Vladimir_Nesov with 17245 karma.
Ranges would work. 1000+ should be high enough for the top category; on last year’s survey only 9% of respondents (80 people) were in that range. On CFAR surveys we’ve used:
I don’t have a Less Wrong account
zero or less
1-99
100-999
1000 or more
Finer categories might be useful and shouldn’t compromise anonymity too much, especially at the low end. This breakdown looks OK to me: no karma score mentioned (341), 0 or less (144), 1-4 (39), 5-9 (27), 10-19 (38), 20-29 (29), 30-49 (40), 50-99 (52), 100-199 (45), 200-299 (27), 300-499 (30), 500-999 (38), 1000-1999 (37) and 2000+ (43). Numbers in brackets are the number of responses in each category on the 2011 survey. Note that another survey now would get even more responses in most categories.
(Personally I’m OK with Yvain’s laissez-faire approach of letting people round karma scores themselves to the degree they want. But I can see why using discrete categories to enforce privacy might be more robust.)
[Edited after army1987 posted his comment to clarify the bracketed numbers.]
That looks great, but I’d split the top range into two (because I don’t feel that comfortable in being lumped with EY et al.) and 50-99 and 100-199 (for consistency, so none gets >40 respondents in the last survey).
That’s way too coarse IMO. I’d prefer having a write-in answer field but suggesting people to round it to one or two significant figures (depending on how concerned they are about their privacy), and maybe accepting the answer “> 5000”.
One possible solution is for Yvain to not publish the karma data of respondents.