RL algorithms don’t minimize costs, but maximize expected reward, which can well be unbounded, so it’s wrong to say that the ML field only minimizes cost.
LLMs minimize expected log probability of correct token, which is indeed bounded at zero from below, but achieving zero in that case means perfectly predicting every single token on the internet.
The boundedness of the thing you’re minimizing is totally irrelevant, since maximizing f(x) is exactly the same as maximizing g(f(x)) where g is a monotonic function. You can trivially turn a bounded function into an unbounded one without changing anything to the solution sets.
Even if utility is bounded between 0 and 1, an agent maximizing the expected utility will still never stop, because you can always decrease the probability you were wrong. Quadruple-check every single step and turn the universe into computronium to make sure you didn’t make any errors.
This is very dumb, Lecun should know better, and I’m sure he *would* know better if he spent 5 minutes thinking about any of this.
RL algorithms don’t minimize costs, but maximize expected reward, which can well be unbounded, so it’s wrong to say that the ML field only minimizes cost.
Yann LeCun’s proposals are based on cost-minimization.
I’m not sure he has coherent expectations, but I’d expect his vibe is some combination of “RL doesn’t currently work” and “fields generally implement safety standards”.
Very many things wrong with all of that:
RL algorithms don’t minimize costs, but maximize expected reward, which can well be unbounded, so it’s wrong to say that the ML field only minimizes cost.
LLMs minimize expected log probability of correct token, which is indeed bounded at zero from below, but achieving zero in that case means perfectly predicting every single token on the internet.
The boundedness of the thing you’re minimizing is totally irrelevant, since maximizing f(x) is exactly the same as maximizing g(f(x)) where g is a monotonic function. You can trivially turn a bounded function into an unbounded one without changing anything to the solution sets.
Even if utility is bounded between 0 and 1, an agent maximizing the expected utility will still never stop, because you can always decrease the probability you were wrong. Quadruple-check every single step and turn the universe into computronium to make sure you didn’t make any errors.
This is very dumb, Lecun should know better, and I’m sure he *would* know better if he spent 5 minutes thinking about any of this.
Yann LeCun’s proposals are based on cost-minimization.
Do you expect Lecun to have been assuming that the entire field of RL stops existing in order to focus on his specific vision?
I’m not sure he has coherent expectations, but I’d expect his vibe is some combination of “RL doesn’t currently work” and “fields generally implement safety standards”.