The OpenAI ES algorithm isn’t very plausible (for exactly why you said), but the general idea of: “existing parameters + random noise → revert if performance got worse, repeat” does seem like a reasonable way to end up with an approximation of the gradient. I had in mind something more like Uber AI’s Neuroevolution, which wouldn’t necessarily require parallelization or storage if the brain did some sort of fast local updating, parameter-wise.
The OpenAI ES algorithm isn’t very plausible (for exactly why you said), but the general idea of: “existing parameters + random noise → revert if performance got worse, repeat” does seem like a reasonable way to end up with an approximation of the gradient. I had in mind something more like Uber AI’s Neuroevolution, which wouldn’t necessarily require parallelization or storage if the brain did some sort of fast local updating, parameter-wise.