In practice Bayesian updating is intractable so we typically sample from the posterior using something SGD. It is plausible that something like SGD is already close to the optimum for a given amount of compute.
I give this view ~20%: There’s so much more info in some datapoints (curvature, third derivative of the function, momentum, see also Empirical Bayes-like SGD, the entire past trajectory through the space) that seems so available and exploitable!
:insightful reaction:
I give this view ~20%: There’s so much more info in some datapoints (curvature, third derivative of the function, momentum, see also Empirical Bayes-like SGD, the entire past trajectory through the space) that seems so available and exploitable!