Current LLMs are trivially mesa-optimisers under the original definition of that term.
Do current LLMs produce several options then compare them according to an objective function?
They do, actually, evaluate each of possible output tokens, then emitting one of the most probable ones, but I think that concern is more about AI comparing larger chunks of text (for instance, evaluating paragraphs of a report by stakeholders’ reaction).
Do current LLMs produce several options then compare them according to an objective function?
They do, actually, evaluate each of possible output tokens, then emitting one of the most probable ones, but I think that concern is more about AI comparing larger chunks of text (for instance, evaluating paragraphs of a report by stakeholders’ reaction).