That article says that each person uses 20% of the tools, not 80%. If everyone used a different 80% that would seem to imply at least a 60% overlap in usage for two people, and at least a 40% overlap for three. Probably the overlaps would fall more or less slowly for different tools along the usage curve you proposed. It seems like there should be some way to at least estimate the expected value here...
Maybe you could check package download statistics and history data (or something?) to see which things are most installed and most used. If you had data from many people I bet you’d find clusters in usage space that could turn into different sorts of flash card decks to help a person “join” that cluster, or become capable of code switching between computer usage styles? Or another use might be to help someone who takes an N year hiatus from coding who doesn’t want to get too rusty at the keyboard?
Maybe you could check package download statistics and history data (or something?) to see which things are most installed and most used. If you had data from many people I bet you’d find clusters in usage space that could turn into different sorts of flash card decks to help a person “join” that cluster, or become capable of code switching between computer usage styles?
AFAIK, the Popcons all heavily anonymize their data down to ‘installed or not’, and don’t include anything useful for clustering. (This is reasonable because with the dozens of thousands of packages and whatever power laws or distributions are involved, it’d only take a few idiosyncratic package installations to break privacy.)
So maybe clusters would be efficient enough—although keeping in mind my 5 minute rule and the point about it being very easy to search for programs, I still think it’s unlikely—but currently I don’t know of any way to generate them.
That article says that each person uses 20% of the tools, not 80%. If everyone used a different 80% that would seem to imply at least a 60% overlap in usage for two people, and at least a 40% overlap for three. Probably the overlaps would fall more or less slowly for different tools along the usage curve you proposed. It seems like there should be some way to at least estimate the expected value here...
Maybe you could check package download statistics and history data (or something?) to see which things are most installed and most used. If you had data from many people I bet you’d find clusters in usage space that could turn into different sorts of flash card decks to help a person “join” that cluster, or become capable of code switching between computer usage styles? Or another use might be to help someone who takes an N year hiatus from coding who doesn’t want to get too rusty at the keyboard?
AFAIK, the Popcons all heavily anonymize their data down to ‘installed or not’, and don’t include anything useful for clustering. (This is reasonable because with the dozens of thousands of packages and whatever power laws or distributions are involved, it’d only take a few idiosyncratic package installations to break privacy.)
So maybe clusters would be efficient enough—although keeping in mind my 5 minute rule and the point about it being very easy to search for programs, I still think it’s unlikely—but currently I don’t know of any way to generate them.