DragonGod comments on Remarks 1–18 on GPT (compressed)

DragonGod 8 Apr 2023 22:58 UTC
2 points
0
The autoregressive language model $μ : T^{k} \to Δ (T)$ which maps a prompt $x \in T^{k}$ to a distribution over tokens $μ (\cdot | x) \in Δ (T)$ .
$T^{k}$ should actually be $T^{*}$ ; I think you mean “the set of all strings constructed from the alphabet of tokens” and not “the set of all length $k$ strings constructed from the alphabet of tokens”?
You used the former meaning earlier for Remark 1:
Let $T$ be the set of possible tokens in our vocabulary. A language model (LLM) is given by a stochastic function $μ : T^{*} \to Δ (T)$ mapping a prompt $(t_{1} \dots t_{k})$ to a predicted token $t_{k + 1}$ .
- Cleo Nardo 9 Apr 2023 4:37 UTC
  4 points
  2
  Parent
  Finite context window.
  - DragonGod 9 Apr 2023 4:43 UTC
    2 points
    0
    Parent
    Realised later on, thanks.
    
    I guess in this formalism you’d need to consider the empty string/similar null token a valid token, so the prompt/completion is prefixed/suffixed with empty strings (to pad to the size of the context window).
    
    Otherwise, you’d need to define the domain as a union over the set of all strings with token lengths $\leq$ the context window.