(I don’t mean to dogpile) I think that selection is the correct word, and that it doesn’t really seem to be smuggling in incorrect connections to evolution.
We could imagine finding a NN that does well according to a loss function by simply randomly initializing many many NNs, and then keeping the one that does best according to the loss function. I think this process would accurately be described as selection; we are literally selecting the model which does best.
I’m not claiming that SGD does this[1], just giving an example of a method to find a low-loss parameter configuration which isn’t related to evolution, and is (in my opinion) best described as “selection”.
Sure. But I think that’s best described as “best-of-k sampling”, which is still better because it avoids implicitly comparing selection-over-learning-setups (i.e. genotypes) with selection-over-parameterizations.
But let’s just say I concede this particular method can be non-crazily called “selection.” AFAICT I think you’re arguing: “There exist ML variants which can be described as ‘selection’.” But speculation about “selecting for low loss” is not confined to those variants, usually people just lump everything in as that. And I doubt that most folks are on the edge of their seats, ready to revoke the analogy if some paper comes out that convincingly shows that ML diverges from “selecting for low loss”...[1]
Actually, I agreed too quickly. Words are not used in a vacuum. Even though this method isn’t related to evolution, and even though a naive person might call it “selection” (and have that be descriptively reasonable), that doesn’t mean it’s best described as “selection.” The reason is that the “s-word” has lots of existing evolutionary connotations. And on my understanding, that’s the main reason you want to call it “selection” to begin with—in order to make analogical claims about the results of this process compared to the results of evolution.
But my whole point is that the analogy is only valid if the two optimization processes (evolution and best-of-k sampling) share the relevant causal mechanisms. So before you start using the s-word and especially before you start using its status as “selection” to support analogies, I want to see that argument first. Else, it should be called something more neutral.
(I don’t mean to dogpile)
I think that selection is the correct word, and that it doesn’t really seem to be smuggling in incorrect connections to evolution.
We could imagine finding a NN that does well according to a loss function by simply randomly initializing many many NNs, and then keeping the one that does best according to the loss function. I think this process would accurately be described as selection; we are literally selecting the model which does best.
I’m not claiming that SGD does this[1], just giving an example of a method to find a low-loss parameter configuration which isn’t related to evolution, and is (in my opinion) best described as “selection”.
Although “Is SGD a Bayesian sampler? Well, almost” does make a related claim.
Sure. But I think that’s best described as “best-of-k sampling”, which is still better because it avoids implicitly comparing selection-over-learning-setups (i.e. genotypes) with selection-over-parameterizations.
But let’s just say I concede this particular method can be non-crazily called “selection.” AFAICT I think you’re arguing: “There exist ML variants which can be described as ‘selection’.” But speculation about “selecting for low loss” is not confined to those variants, usually people just lump everything in as that. And I doubt that most folks are on the edge of their seats, ready to revoke the analogy if some paper comes out that convincingly shows that ML diverges from “selecting for low loss”...[1]
To be clear, that evidence already exists.
Hi, do you have a links to the papers/evidence?
Actually, I agreed too quickly. Words are not used in a vacuum. Even though this method isn’t related to evolution, and even though a naive person might call it “selection” (and have that be descriptively reasonable), that doesn’t mean it’s best described as “selection.” The reason is that the “s-word” has lots of existing evolutionary connotations. And on my understanding, that’s the main reason you want to call it “selection” to begin with—in order to make analogical claims about the results of this process compared to the results of evolution.
But my whole point is that the analogy is only valid if the two optimization processes (evolution and best-of-k sampling) share the relevant causal mechanisms. So before you start using the s-word and especially before you start using its status as “selection” to support analogies, I want to see that argument first. Else, it should be called something more neutral.