orthonormal comments on The Credit Assignment Problem

orthonormal 8 Nov 2019 22:06 UTC
LW: 14 AF: 9
AF
Shapley Values [thanks Zack for reminding me of the name] are akin to credit assignment: you have a bunch of agents coordinating to achieve something, and then you want to assign payouts fairly based on how much each contribution mattered to the final outcome.
And the way you do this is, for each agent you look at how good the outcome would have been if everybody except that agent had coordinated, and then you credit each agent proportionally to how much the overall performance would have fallen off without them.
So what about doing the same here- send rewards to each contributor proportional to how much they improved the actual group decision (assessed by rerunning it without them and seeing how performance declines)?
- Zack_M_Davis 8 Nov 2019 22:35 UTC
  LW: 28 AF: 8
  AF Parent
  
  I can’t for the life of me remember what this is called
  
  Shapley value
  
  (Best wishes, Less Wrong Reference Desk)
- abramdemski 13 Nov 2019 21:33 UTC
  LW: 5 AF: 3
  AF Parent
  Yeah, it’s definitely related. The main thing I want to point out is that Shapley values similarly require a model in order to calculate. So you have to distinguish between the problem of calculating a detailed distribution of credit and being able to assign credit “at all”—in artificial neural networks, backprop is how you assign detailed credit, but a loss function is how you get a notion of credit at all. Hence, the question “where do gradients come from?”—a reward function is like a pile of money made from a joint venture; but to apply backprop or Shapley value, you also need a model of counterfactual payoffs under a variety of circumstances. This is a problem, if you don’t have a seperate “epistemic” learning process to provide that model—ie, it’s a problem if you are trying to create one big learning algorithm that does everything.
  Specifically, you don’t automatically know how to
  send rewards to each contributor proportional to how much they improved the actual group decision
  because in the cases I’m interested in, ie online learning, you don’t have the option of
  rerunning it without them and seeing how performance declines
  -- because you need a model in order to rerun.
  But, also, I think there are further distinctions to make. I believe that if you tried to apply Shapley value to neural networks, it would go poorly; and presumably there should be a “philosophical” reason why this is the case (why Shapley value is solving a different problem than backprop). I don’t know exactly what the relevant distinction is.
  (Or maybe Shapley value works fine for NN learning; but, I’d be surprised.)
- Charlie Steiner 9 Nov 2019 8:56 UTC
  LW: 2 AF: 1
  AF Parent
  Removing things entirely seems extreme. How about having a continuous “contribution parameter,” where running the algorithm without an element would correspond to turning this parameter down to zero, but you could also set the parameter to 0.5 if you wanted that element to have 50% of the influence it has right now. Then you can send rewards to elements if increasing their contribution parameter would improve the decision.
  :P
  - orthonormal 9 Nov 2019 17:54 UTC
    LW: 2 AF: 1
    AF Parent
    Removing things entirely seems extreme.
    Dropout is a thing, though.
    - Charlie Steiner 9 Nov 2019 18:42 UTC
      LW: 3 AF: 1
      AF Parent
      Dropout is like the converse of this—you use dropout to assess the non-outdropped elements. This promotes resiliency to perturbations in the model—whereas if you evaluate things by how bad it is to break them, you could promote fragile, interreliant collections of elements over resilient elements.
      I think the root of the issue is that this Shapley value doesn’t distinguish between something being bad to break, and something being good to have more of. If you removed all my blood I would die, but that doesn’t mean that I would currently benefit from additional blood.
      Anyhow, the joke was that as soon as you add a continuous parameter, you get gradient descent back again.