the gears to ascension comments on Machine Learning vs Differential Privacy

the gears to ascension 5 Jan 2023 18:46 UTC
4 points
1
because DP gives up performance for guaranteed generalization.
- Ilio 5 Jan 2023 20:00 UTC
  3 points
  0
  Parent
  Is « garanteed » important in your answer? E.g. do you know some code that shows this is real in practice or it’s more of a theorical result?
  - the gears to ascension 5 Jan 2023 21:23 UTC
    4 points
    0
    Parent
    It’s well studied. I’m not an expert in differential privacy and would need to read multiple papers in depth to be sure I’d answered precisely, but I know that at an english level of mathematical description, what’s guaranteed is that there is definitely not memorization of individual datapoints, so any successful performance on a test set is definitely generalization. That doesn’t mean it’s causal generalization though. and the accuracy is usually worse—getting both differential privacy and capabilities pushes non-differentially-private capabilities more, usually, I think, or something. I’d have to go paper hunting to find how well it performs. Instead of doing that, I’ll post my usual pitch: I strongly encourage you to do your level best to find some papers, because that shit is not always trivial and attempting and failing is a really good finding-stuff workout. tools I’d recommend include (in order of recommendation) wiki page on differential privacy, opening papers on semantic scholar → a result and browsing forward through the citations they’ve received, especially sorted by latest (same link as “a result” but with sort), maybe a metaphor.systems query on the topic (sign in required but worth it). problem is, these results don’t directly answer your question without doing some reading; I’d suggest opening papers, skimming through them fast to see if they answer, and browse the paper citation graph forwards and backwards towards titles that sound relevant. it might help to put the papers that seem relevant into ~~papermap.xyz [edit: giving me 500 errors now]~~ (via reddit u/Neabfi) or https://my.paperscape.org/ (which helps with forward and backward browsing a lot but is a jankier ui than papermap).
    
    some results from the metaphor query:
    
    https://www.borealisai.com/research-blogs/tutorial-12-differential-privacy-i-introduction/
    https://differentialprivacy.org/
    https://opendp.org/about
    - rpglover64 6 Jan 2023 1:29 UTC
      4 points
      0
      Parent
      getting both differential privacy and capabilities pushes non-differentially-private capabilities more, usually, I think, or something
      I don’t think it does in general, and every case I can think of right now did not, but I agree that it is a worthwhile thing to worry about.
      tools [for finding DP results] I’d recommend include
      I’d add clicking through citations and references on arxiv and looking at the litmap explorer in arxiv.
    - Ilio 6 Jan 2023 14:19 UTC
      3 points
      0
      Parent
      Thx for these links. I’ll need some time for a deeper reading, but after a few hours the first bits of my take home message are probably settled to: there’s no theorical guarantee that pursuing DP (or, more precisely, pursuing any of DP numerous variants) lead to worse result, except that’s what everyone report in practice, so if that’s a pure technical issue it’ll probably be not trivial to fix it.
      
      I like your note that DP generalizability might not be causal generalizability: that’s both a contender for explaining why these seemingly technical difficulties arise and a potentially key thought for improving this strategy.