Even if machine learning tends to average across batches, the decision about how to cluster the data is usually a function of the kinds of questions you are trying to answer with the data. It seems to me raw data is more useful than clustered, averaged data, because it has not presupposed the types of questions that will be asked.
Yes there is probably a fundamental information tradeoff between anonymization and data effectiveness, but it isn’t clear that this will be much of a limiter in practice.
Secondly, people should be able to opt-in to various levels of anonymization risk, and perhaps that could be tied to financial incentives, so that you can effectively sell your data to some degree.
Yes, agreed with just about all of that.
Yes there is probably a fundamental information tradeoff between anonymization and data effectiveness, but it isn’t clear that this will be much of a limiter in practice.
Secondly, people should be able to opt-in to various levels of anonymization risk, and perhaps that could be tied to financial incentives, so that you can effectively sell your data to some degree.