This is an interesting thought, but it seems very hard to realize as you have to distill the unique contribution of the sample, as opposed to much more widespread information that happens to be present in the sample.
Weight updates depend heavily on training order of course, so you’re really looking for something like the Shapley value of the sample, except that “impact” is liable to be an elusive, high-dimensional quantity in itself.
hmmmm. yeah, essentially what I’m asking for is certified classification… and intuitively I don’t think that’s actually too much to ask for. there has been some work on certifying neural networks, and it has led me to believe that the current bottleneck is that models are too dense by several orders of magnitude. concerningly, more sparse models are also significantly more capable. One would need to ensure that the update is fully tagged at every step of the process such that you can always be sure how you are changing decision boundaries...
This is an interesting thought, but it seems very hard to realize as you have to distill the unique contribution of the sample, as opposed to much more widespread information that happens to be present in the sample.
Weight updates depend heavily on training order of course, so you’re really looking for something like the Shapley value of the sample, except that “impact” is liable to be an elusive, high-dimensional quantity in itself.
hmmmm. yeah, essentially what I’m asking for is certified classification… and intuitively I don’t think that’s actually too much to ask for. there has been some work on certifying neural networks, and it has led me to believe that the current bottleneck is that models are too dense by several orders of magnitude. concerningly, more sparse models are also significantly more capable. One would need to ensure that the update is fully tagged at every step of the process such that you can always be sure how you are changing decision boundaries...