What exactly is being done—what type of thing is being created—when we run a process like “use gradient descent to minimize a loss function on training data, as long as the loss function is also being minimized on test data”?
What exactly is being done—what type of thing is being created—when we run a process like “use gradient descent to minimize a loss function on training data, as long as the loss function is also being minimized on test data”?