johnswentworth comments on Challenge: construct a Gradient Hacker

johnswentworth Mar 9, 2023, 5:13 AM
LW: 13 AF: 9
4
AF
Seems like the easiest way to satisfy that definition would be to:
- Set up a network and dataset with at least one local minimum which is not a global minimum
- … Then add an intermediate layer which estimates the gradient, and doesn’t connect to the output at all.
- Thomas Larsen Mar 10, 2023, 12:38 AM
  7 points
  0
  Parent
  This feels like cheating to me, but I guess I wasn’t super precise with ‘feedforward neural network’. I meant ‘fully connected neural network’, so the gradient computation has to be connected by parameters to the outputs. Specifically, I require that you can write the network as
  $f (x, θ) = σ_{n} \circ W_{n} \circ \dots σ_{1} \circ W_{1} [x, θ]^{T}$
  where the weight matrices are some nice function of $θ$ (where we need a weight sharing function to make the dimensions work out. The weight sharing function takes in $ϕ$ and produces the $W_{i}$ matrices that are actually used in the forward pass.)
  I guess I should be more precise about what ‘nice means’, to rule out weight sharing functions that always zero out input, but it turns out this is kind of tricky. Let’s require the weight sharing function $ϕ : R^{w} \to R^{W}$ to be differentiable and have image that satisfies $[- 1, 1] \subset p r o j_{n} i m (ϕ)$ for any projection. (A weaker condition is if the weight sharing function can only duplicate parameters).
- Thomas Larsen Mar 10, 2023, 12:51 AM
  5 points
  0
  Parent
  This is a plausible internal computation that the network could be doing, but the problem is that the gradients flow back through from the output to the computation of the gradient to the true value y, and so GD will use that to set the output to be the appropriate true value.
  - Thomas Larsen Mar 20, 2023, 9:10 PM
    2 points
    0
    Parent
    Following up to clarify this: the point is that this attempt fails 2a because if you perturb the weights along the connection $\nabla_{θ} L (θ) - - \to ϵ \cdot I d o u t p u t$ , there is now a connection from the internal representation of $y$ to the output, and so training will send this thing to the function $f (D, θ) \approx y$ .
- abhayesian Mar 10, 2023, 3:17 PM
  4 points
  0
  Parent
  I’m a bit confused as to why this would work.
  If the circuit in the intermediate layer that estimates the gradient does not influence the output, wouldn’t they just be free parameters that can be varied with no consequence to the loss? If so, this violates 2a since perturbing these parameters would not get the model to converge to the desired solution.
  - johnswentworth Mar 10, 2023, 4:32 PM
    2 points
    0
    Parent
    Good point. Could hardcode them, so those parameters aren’t free to vary at all.