A) “the models with large context windows can process the large context in order to produce an output dependent on that context”
B) “the models have been trained on producing extra reliable and/or pertinent short outputs based on using a very long input, making good use of their new long context windows.”
C) “the models have been trained to produce very long contexts after being given just a short input prompt (e.g. giving a summary blurb from the back of novel as a prompt, and receiving a full novel as a response.)”
D) “the models with large context windows can process a small amount of context and reliably output a large output that fills the rest of their large context in a coherent useful way that logically follows from the small context.”
What we currently have is A. I don’t know if anyone is near-term planning on try to do B, maybe they are? I’m pretty sure no-one is near-term planning on trying to do C, which seems much harder than B. I think D is basically the result of a deploying a model trained like C, and thus I don’t expect D without someone first doing C.
I think Andrew’s quote is saying that we now have A, but do not yet have B, and that B would likely be an improvement over A.
Not enough context to know what Gwern means by the short summary you give, but I’d think perhaps he could mean either:
x) A is good enough for the thing Andrew is describing, we don’t need B. Maybe B will happen, and maybe it will be better, but Andrew’s expressed desire is possible just with A.
y) Agrees that what Andrew is describing requires B, but is implying that A unlocks B, and that we should thus expect B to come pretty soon due to competitive pressures.
I think people are conflating between
A) “the models with large context windows can process the large context in order to produce an output dependent on that context”
B) “the models have been trained on producing extra reliable and/or pertinent short outputs based on using a very long input, making good use of their new long context windows.”
C) “the models have been trained to produce very long contexts after being given just a short input prompt (e.g. giving a summary blurb from the back of novel as a prompt, and receiving a full novel as a response.)”
D) “the models with large context windows can process a small amount of context and reliably output a large output that fills the rest of their large context in a coherent useful way that logically follows from the small context.”
What we currently have is A. I don’t know if anyone is near-term planning on try to do B, maybe they are? I’m pretty sure no-one is near-term planning on trying to do C, which seems much harder than B. I think D is basically the result of a deploying a model trained like C, and thus I don’t expect D without someone first doing C.
I think Andrew’s quote is saying that we now have A, but do not yet have B, and that B would likely be an improvement over A.
Not enough context to know what Gwern means by the short summary you give, but I’d think perhaps he could mean either:
x) A is good enough for the thing Andrew is describing, we don’t need B. Maybe B will happen, and maybe it will be better, but Andrew’s expressed desire is possible just with A.
y) Agrees that what Andrew is describing requires B, but is implying that A unlocks B, and that we should thus expect B to come pretty soon due to competitive pressures.