Dropout is like the converse of this—you use dropout to assess the non-outdropped elements. This promotes resiliency to perturbations in the model—whereas if you evaluate things by how bad it is to break them, you could promote fragile, interreliant collections of elements over resilient elements.
I think the root of the issue is that this Shapley value doesn’t distinguish between something being bad to break, and something being good to have more of. If you removed all my blood I would die, but that doesn’t mean that I would currently benefit from additional blood.
Anyhow, the joke was that as soon as you add a continuous parameter, you get gradient descent back again.
Dropout is like the converse of this—you use dropout to assess the non-outdropped elements. This promotes resiliency to perturbations in the model—whereas if you evaluate things by how bad it is to break them, you could promote fragile, interreliant collections of elements over resilient elements.
I think the root of the issue is that this Shapley value doesn’t distinguish between something being bad to break, and something being good to have more of. If you removed all my blood I would die, but that doesn’t mean that I would currently benefit from additional blood.
Anyhow, the joke was that as soon as you add a continuous parameter, you get gradient descent back again.