The more specific case I was hinting at was figuring out the loss <--> gradient landscape relationship.
Which yes, a highschooler can do for a 5 cell network, but for any real network it seems like it’s fairly hard to say anything about it… I.e. I’ve read a few paper delving into the subject and they seem complex to me.
Maybe not PhD level ? I don’t know. But hard enough that most people usually choose to stick with a loss that makes sense for the task rather than optimize it such that the resulting gradient is “easy to solve” (aka yields faster training and/or converges on a “more” optimal solution).
But I’m not 100% sure I’m correct here and maybe learning the correct 5 primitives makes the whole thing seem like childplay… though based on people’s behavior around the subject I kinda doubt it.
The more specific case I was hinting at was figuring out the loss <--> gradient landscape relationship.
Which yes, a highschooler can do for a 5 cell network, but for any real network it seems like it’s fairly hard to say anything about it… I.e. I’ve read a few paper delving into the subject and they seem complex to me.
Maybe not PhD level ? I don’t know. But hard enough that most people usually choose to stick with a loss that makes sense for the task rather than optimize it such that the resulting gradient is “easy to solve” (aka yields faster training and/or converges on a “more” optimal solution).
But I’m not 100% sure I’m correct here and maybe learning the correct 5 primitives makes the whole thing seem like childplay… though based on people’s behavior around the subject I kinda doubt it.