Following up to clarify this: the point is that this attempt fails 2a because if you perturb the weights along the connection ∇θL(θ)−−→ϵ⋅Idoutput, there is now a connection from the internal representation of y to the output, and so training will send this thing to the function f(D,θ)≈y.
Following up to clarify this: the point is that this attempt fails 2a because if you perturb the weights along the connection ∇θL(θ)−−→ϵ⋅Idoutput, there is now a connection from the internal representation of y to the output, and so training will send this thing to the function f(D,θ)≈y.