I just don’t really understand in what way “token prediction” is anything less than “literally any possible function from a domain of all possible observations to a domain of all possible actions”. At least if your “tokens” cover extensively enough all the space of possible things you might want to do or say.
There’s no evidence that we do so based solely on token prediction, so that’s irrelevant.
I just don’t really understand in what way “token prediction” is anything less than “literally any possible function from a domain of all possible observations to a domain of all possible actions”. At least if your “tokens” cover extensively enough all the space of possible things you might want to do or say.