replacing the SGD with something that takes the shortest and not the steepest path
Maybe we can design a local search strategy similar to gradient descent which does try to stay close to the initial point x0? E.g., if at x, go a small step into a direction that has the minimal scalar product with x – x0 among those that have at most an angle of alpha with the current gradient, where alpha>0 is a hyperparameter. One might call this “stochastic cone descent” if it does not yet have a name.
Doesn’t sound like it’d meaningfully change the fundamental dynamics. It’s an intervention on the order of things like Momentum or Adam, and they’re still “basically just the SGD”. Pretty sure similar will be the case here: it may introduce some interesting effects, but won’t actually robustly address the greed.
… My current thoughts is that “how can we design a procedure that takes the shortest and not the steepest path to the AGI?” is just “design it manually”. I. e., the corresponding “training algorithm” we’ll want to replace the SGD with is just “our own general intelligence”.
I’m sorry but I fail to see the analogy to momentum or adam, in neither of which the vector or distance from the current point to the initial point plays any role as far as I can see. It is also different from regularizations that modify the objective function, say to penalize moving away from the initial point, which would change the location of all minima. The method I propose preserves all minima and just tries to move towards the one closest to the initial point. I have discussed it with some mathematical optimization experts and they think it’s new.
Maybe we can design a local search strategy similar to gradient descent which does try to stay close to the initial point x0? E.g., if at x, go a small step into a direction that has the minimal scalar product with x – x0 among those that have at most an angle of alpha with the current gradient, where alpha>0 is a hyperparameter. One might call this “stochastic cone descent” if it does not yet have a name.
Doesn’t sound like it’d meaningfully change the fundamental dynamics. It’s an intervention on the order of things like Momentum or Adam, and they’re still “basically just the SGD”. Pretty sure similar will be the case here: it may introduce some interesting effects, but won’t actually robustly address the greed.
… My current thoughts is that “how can we design a procedure that takes the shortest and not the steepest path to the AGI?” is just “design it manually”. I. e., the corresponding “training algorithm” we’ll want to replace the SGD with is just “our own general intelligence”.
I’m sorry but I fail to see the analogy to momentum or adam, in neither of which the vector or distance from the current point to the initial point plays any role as far as I can see. It is also different from regularizations that modify the objective function, say to penalize moving away from the initial point, which would change the location of all minima. The method I propose preserves all minima and just tries to move towards the one closest to the initial point. I have discussed it with some mathematical optimization experts and they think it’s new.