“Output next token that has maximum probability according to your posterior distribution given prompt” is a literally an optimization problem. This problem gains huge benefits if system that tries to solve it is more coherent.
Strictly speaking, LLMs can’t be “just PDF calculators”. Straight calculation of PDF on such amount of data is computationally untractable (or we would have GPTs in golden era of bayesian models). Actual algorithms should contain bazillion shortcuts and approximations and “having an agent inside system” is as good shortcut as anything else.
“Output next token that has maximum probability according to your posterior distribution given prompt” is a literally an optimization problem. This problem gains huge benefits if system that tries to solve it is more coherent.
Strictly speaking, LLMs can’t be “just PDF calculators”. Straight calculation of PDF on such amount of data is computationally untractable (or we would have GPTs in golden era of bayesian models). Actual algorithms should contain bazillion shortcuts and approximations and “having an agent inside system” is as good shortcut as anything else.
LLMs calculate pdfs, regardless of whether they calculate ‘the true’ pdf.