nostalgebraist comments on nostalgebraist’s Shortform

nostalgebraist 25 Jun 2024 0:21 UTC
3 points
0
Yeah, it’s on my radar and seems promising.
Given that Gemini 1.5 Flash already performs decently in my tests with relatively short prompts, and it’s even cheaper than GPT-3.5-Turbo, I could probably get a significant pareto improvement (indeed, probably an improvement on all fronts) by switching from {GPT-3.5-Turbo + short prompt} to {Gemini 1.5 Flash + long cached prompt}. Just need to make the case it’s worth the hassle...

EDIT: oh, wait, I just found the catch.
The minimum input token count for context caching is 32,768
Obviously nice for truly long context stuff, but I’m not going to add tens of thousands of tokens to my prompt just for the privilege of using this feature.
- gwern 25 Jun 2024 20:49 UTC
  3 points
  0
  Parent
  Yeah, I was thinking that you might be able to fill the context adequately, because otherwise you would have to be in an awkward spot where you have too many examples to cheaply include them in the prompt to make the small cheap models work out, but also still not enough for finetuning to really shine by training a larger high-end model over millions of tokens to zero-shot it.