I just don’t really understand in what way “token prediction” is anything less than “literally any possible function from a domain of all possible observations to a domain of all possible actions”. At least if your “tokens” cover extensively enough all the space of possible things you might want to do or say.
I just don’t really understand in what way “token prediction” is anything less than “literally any possible function from a domain of all possible observations to a domain of all possible actions”. At least if your “tokens” cover extensively enough all the space of possible things you might want to do or say.