Jobst Heitzig comments on Linda Linsefors’s Shortform

Jobst Heitzig 1 Feb 2023 8:49 UTC
1 point
AF
From my reading of quantilizers, they might still choose “near-optimal” actions, just only with a small probability. Whereas a system based on decision transformers (possibly combined with a LLM) could be designed that we could then simply tell to “make me a tea of this quantity and quality within this time and with this probability” and it would attempt to do just that, without trying to make more or better tea or faster or with higher probability.
- Linda Linsefors 2 Feb 2023 10:22 UTC
  1 point
  Parent
  Yes, that is a thing you can do with decision transforms too. I was referring to variant of the decision transformer (see link in original short form) where the AI samples the reward it’s aiming for.