Steven Byrnes comments on Stephen Fowler’s Shortform

Steven Byrnes 23 Jun 2024 19:42 UTC
3 points
1
Thanks for your reply! A couple quick things:
> I don’t even know why the RFLO paper put that criterion in …
I don’t have any great insight here, but that’s very interesting to think about.
I thought about it a bit more and I think I know what they were doing. I bet they were trying to preempt the pedantic point (related) that everything is an optimization process if you allow the objective function to be arbitrarily convoluted and post hoc. E.g. any trained model M is the global maximum of the objective function “F where F(x)=1 if x is the exact model M, and F(x)=0 in all other cases”. So if you’re not careful, you can define “optimization process” in a way that also includes rocks.
I think they used “explicitly represented objective function” as a straightforward case that would be adequate to most applications, but if they had wanted to they could have replaced it with the slightly-more-general notion of “objective function that can be deduced relatively straightforwardly by inspecting the nuts-and-bolts of the optimization process, and in particular it shouldn’t be a post hoc thing where you have to simulate the entire process of running the (so-called) optimization algorithm in order to answer the question of what the objective is.”
I would guess that “clever hardware implementation that performs the exact same weight updates” without an explicitly represented objective function ends up being wildly inefficient.
Oh sorry, that’s not what I meant. For example (see here) the Python code y *= 1.5 - x * y * y / 2 happens to be one iteration of Newton’s method to make y a better approximation to 1/√x. So if you keep running this line of code over and over, you’re straightforwardly running an optimization algorithm that finds the y that minimizes the objective function |x – 1/y²|. But I don’t see “|x – 1/y²|” written or calculated anywhere in that one line of source code. The source code skipped the objective and went straight to the update step.
I have a vague notion that I’ve seen a more direct example kinda like this in the RL literature. Umm, maybe it was the policy gradient formula used in some version of policy-gradient RL? I recall that (this version of) the policy gradient formula was something involving logarithms, and I was confused for quite a while where this formula came from, until eventually I found an explanation online where someone started with a straightforward intuitive objective function and did some math magic and wound up deriving that policy gradient formula with the logarithms. But the policy gradient formula (= update step) is all you need to actually run that RL process in practice. The actual objective function need not be written or calculated anywhere in the RL source code. (I could be misremembering. This was years ago.)