Sure. But I think that’s best described as “best-of-k sampling”, which is still better because it avoids implicitly comparing selection-over-learning-setups (i.e. genotypes) with selection-over-parameterizations.
But let’s just say I concede this particular method can be non-crazily called “selection.” AFAICT I think you’re arguing: “There exist ML variants which can be described as ‘selection’.” But speculation about “selecting for low loss” is not confined to those variants, usually people just lump everything in as that. And I doubt that most folks are on the edge of their seats, ready to revoke the analogy if some paper comes out that convincingly shows that ML diverges from “selecting for low loss”...[1]
Sure. But I think that’s best described as “best-of-k sampling”, which is still better because it avoids implicitly comparing selection-over-learning-setups (i.e. genotypes) with selection-over-parameterizations.
But let’s just say I concede this particular method can be non-crazily called “selection.” AFAICT I think you’re arguing: “There exist ML variants which can be described as ‘selection’.” But speculation about “selecting for low loss” is not confined to those variants, usually people just lump everything in as that. And I doubt that most folks are on the edge of their seats, ready to revoke the analogy if some paper comes out that convincingly shows that ML diverges from “selecting for low loss”...[1]
To be clear, that evidence already exists.
Hi, do you have a links to the papers/evidence?