In the case of binary classification (i.e. where |Y|=2), we always set ℓ(h):=D{(x,y)∈X×Y|h(x)≠y}, i.e. the probability that our predictor gets the label on a freshly sampled point wrong.
Does this assume that we want a false negative to be penalized the same as a false positive? Or would that be adjusted for in some other place?
Yeah, it does mean that. I hadn’t thought about it until now, but you’re right that it’s not at all obvious. The book never mentions weighing them differently, but the model certainly allows it. It may be that an asymmetric loss function complicates the results; I’d have to check the proofs.
I’ll edit the part you quoted to make it a weaker claim.
Does this assume that we want a false negative to be penalized the same as a false positive? Or would that be adjusted for in some other place?
Yeah, it does mean that. I hadn’t thought about it until now, but you’re right that it’s not at all obvious. The book never mentions weighing them differently, but the model certainly allows it. It may be that an asymmetric loss function complicates the results; I’d have to check the proofs.
I’ll edit the part you quoted to make it a weaker claim.