This is a plausible internal computation that the network could be doing, but the problem is that the gradients flow back through from the output to the computation of the gradient to the true value y, and so GD will use that to set the output to be the appropriate true value.
Following up to clarify this: the point is that this attempt fails 2a because if you perturb the weights along the connection ∇θL(θ)−−→ϵ⋅Idoutput, there is now a connection from the internal representation of y to the output, and so training will send this thing to the function f(D,θ)≈y.
This is a plausible internal computation that the network could be doing, but the problem is that the gradients flow back through from the output to the computation of the gradient to the true value y, and so GD will use that to set the output to be the appropriate true value.
Following up to clarify this: the point is that this attempt fails 2a because if you perturb the weights along the connection ∇θL(θ)−−→ϵ⋅Idoutput, there is now a connection from the internal representation of y to the output, and so training will send this thing to the function f(D,θ)≈y.