In discussing online privacy, people will sometimes say things like
”if the data
isn’t required to operate, you don’t need it.” If I can turn on
tracking protection, stop sending data, and nothing breaks, then
clearly it wasn’t needed, right? But consider these three cases:
Learning from implicit feedback: dictation software can operate
without learning what corrections people make, or a search engine can
operate without learning what links people click on, but the overall
quality will be lower. Each individual piece of information isn’t
required, but the feedback loop allows building a substantially better
product.
Incremental rollouts: when you make changes to software that
operates in complex environments it can be very difficult to ensure
that it operates correctly through testing alone. Incremental
rollouts, with telemetry to verify that there are no regressions or
that relevant bugs have been fixed, produces better software. Even
Firefox collects
telemetry by default.
Ads: most websites are able to offer their writing for free,
without a paywall, because they can get paid for showing ads.
Collecting more data makes ads more efficient, which makes them more
profitable for the sites, which translates into more competition to
provide users with things to read. (more)
Instead of pushing for “don’t collect data”, I think it would make a
lot more sense for advocates to push for “only collect data privately”
and work to make that easier (carrot) or mandatory (stick). None of
these uses require individual level data, they’re just easiest to
implement by sending all of the data back to a central server and
processing it there.
(What does “private” mean? Ideally it means that no one reviewing the
data can tell what your, or any other individual’s, contribution was.
This is formalized as differential
privacy, and is typically implemented by adding noise proportional
to the maximum contribution any individual could have. In some cases
k-anonymity
may also provide good protection, but it’s trickier. And this is only the beginning; privacy
researchers and engineers have been putting a lot of work into this
space.)
Essentialness of Data
Link post
In discussing online privacy, people will sometimes say things like ” if the data isn’t required to operate, you don’t need it.” If I can turn on tracking protection, stop sending data, and nothing breaks, then clearly it wasn’t needed, right? But consider these three cases:
Learning from implicit feedback: dictation software can operate without learning what corrections people make, or a search engine can operate without learning what links people click on, but the overall quality will be lower. Each individual piece of information isn’t required, but the feedback loop allows building a substantially better product.
Incremental rollouts: when you make changes to software that operates in complex environments it can be very difficult to ensure that it operates correctly through testing alone. Incremental rollouts, with telemetry to verify that there are no regressions or that relevant bugs have been fixed, produces better software. Even Firefox collects telemetry by default.
Ads: most websites are able to offer their writing for free, without a paywall, because they can get paid for showing ads. Collecting more data makes ads more efficient, which makes them more profitable for the sites, which translates into more competition to provide users with things to read. (more)
Instead of pushing for “don’t collect data”, I think it would make a lot more sense for advocates to push for “only collect data privately” and work to make that easier (carrot) or mandatory (stick). None of these uses require individual level data, they’re just easiest to implement by sending all of the data back to a central server and processing it there.
(What does “private” mean? Ideally it means that no one reviewing the data can tell what your, or any other individual’s, contribution was. This is formalized as differential privacy, and is typically implemented by adding noise proportional to the maximum contribution any individual could have. In some cases k-anonymity may also provide good protection, but it’s trickier. And this is only the beginning; privacy researchers and engineers have been putting a lot of work into this space.)
Comment via: facebook