Does it really work on RULER( benchmark from Nvidia)?Not sure where but saw some controversies, https://arxiv.org/html/2410.18745v1#S1 is best I did find now...Edit: Aah, this was what I had on mind: https://www.reddit.com/r/LocalLLaMA/comments/1io3hn2/nolima_longcontext_evaluation_beyond_literal/
I assume for Pokémon the model doesn’t need to remember everything exactly, so the recall quality may be less important than the quantity.
Does it really work on RULER( benchmark from Nvidia)?
Not sure where but saw some controversies, https://arxiv.org/html/2410.18745v1#S1 is best I did find now...
Edit: Aah, this was what I had on mind: https://www.reddit.com/r/LocalLLaMA/comments/1io3hn2/nolima_longcontext_evaluation_beyond_literal/
I assume for Pokémon the model doesn’t need to remember everything exactly, so the recall quality may be less important than the quantity.