From my reading of quantilizers, they might still choose “near-optimal” actions, just only with a small probability. Whereas a system based on decision transformers (possibly combined with a LLM) could be designed that we could then simply tell to “make me a tea of this quantity and quality within this time and with this probability” and it would attempt to do just that, without trying to make more or better tea or faster or with higher probability.
Yes, that is a thing you can do with decision transforms too. I was referring to variant of the decision transformer (see link in original short form) where the AI samples the reward it’s aiming for.
From my reading of quantilizers, they might still choose “near-optimal” actions, just only with a small probability. Whereas a system based on decision transformers (possibly combined with a LLM) could be designed that we could then simply tell to “make me a tea of this quantity and quality within this time and with this probability” and it would attempt to do just that, without trying to make more or better tea or faster or with higher probability.
Yes, that is a thing you can do with decision transforms too. I was referring to variant of the decision transformer (see link in original short form) where the AI samples the reward it’s aiming for.