Idea: for the survey, get users to put their names into a hash function (link from the survey) and then next year, use the same function, so that you can compare users year over year. If there’s concern about this being dangerous for the main LW survey, that column could just be removed before that data is published, and only made available for use by CFAR. Or something. This isn’t a fully-fleshed out idea, just a quick thought on how one might go about turning basic correlational data into more longitudinal data.
I bet we could do this (with less confidence, because of rare false positives) just by mass-comparing results across years. We collected a fair amount of stable information. Some other system would be nice, though, if it adds reliability and makes it easier for us to vary the survey format or contents over time.
Another option is to generate a token that the user can save and re-enter when they take the survey again the next year. Then the data could be tagged with a hash of the token.
You’d get some misplaced tokens, but it should be pretty safe, privacy-wise. I think the biggest concern would be that linked results could narrow down the possibilities for someone’s real-life identity, especially if they had a significant change of demographic status such as moving from one place to another or getting married.
Idea: for the survey, get users to put their names into a hash function (link from the survey) and then next year, use the same function, so that you can compare users year over year. If there’s concern about this being dangerous for the main LW survey, that column could just be removed before that data is published, and only made available for use by CFAR. Or something. This isn’t a fully-fleshed out idea, just a quick thought on how one might go about turning basic correlational data into more longitudinal data.
I bet we could do this (with less confidence, because of rare false positives) just by mass-comparing results across years. We collected a fair amount of stable information. Some other system would be nice, though, if it adds reliability and makes it easier for us to vary the survey format or contents over time.
I agree but think it should not be in the public LW data.
Another option is to generate a token that the user can save and re-enter when they take the survey again the next year. Then the data could be tagged with a hash of the token.
You’d get some misplaced tokens, but it should be pretty safe, privacy-wise. I think the biggest concern would be that linked results could narrow down the possibilities for someone’s real-life identity, especially if they had a significant change of demographic status such as moving from one place to another or getting married.