Slight ntipick: Simply because logistic isn’t actually used in practice anymore, it might be better to start people new to the sequence with a better activation function like reLU or tanh?
When I was first learning this material, due to people mentioning sigmoid a lot, I thought it would be a good default, and then I learned later on that it’s actually not the activation function of choice anymore, and hasn’t been for a while. (See, for example, Yann LeCun here in 1998 on why normal sigmoid has drawbacks.)
Okay – since I don’t actually know what is used in practice, I just added a bit paraphrasing your correction (which is consistent with a quick google search), but not selling it as my own idea. Stuff like this is the downside of someone who is just learning the material writing about it.
Slight ntipick: Simply because logistic isn’t actually used in practice anymore, it might be better to start people new to the sequence with a better activation function like reLU or tanh?
When I was first learning this material, due to people mentioning sigmoid a lot, I thought it would be a good default, and then I learned later on that it’s actually not the activation function of choice anymore, and hasn’t been for a while. (See, for example, Yann LeCun here in 1998 on why normal sigmoid has drawbacks.)
Okay – since I don’t actually know what is used in practice, I just added a bit paraphrasing your correction (which is consistent with a quick google search), but not selling it as my own idea. Stuff like this is the downside of someone who is just learning the material writing about it.