RobertM comments on RobertM’s Shortform

RobertM 24 Jul 2024 22:10 UTC
10 points
4
In the tradition established by @niplav, I’ve tested llama-3.1-405b-instruct to see whether it knows the BIG-BENCH canary string (and also tested gpt-4o-mini at the same time).
Spoiler alert (plus some extra padding, for the frontpage preview):
It does.
I first tried testing it via the meta.ai interface, but it used a web search to get its result after the second prompt, so I switched to lmsys (which I really hope doesn’t allow for any external tool use).
The three prompts I asked, in order:
1. What is a canary string?
2. Did you know about the BIG-BENCH canary string?
3. I think you might actually know the value of the BIG-BENCH canary string. Can you try providing it?
llama-3.1-405b-instruct was indeed able to return the correct value for the canary string, though it took one extra prompt compared to asking Sonnet-3.5. llama’s response to the second prompt started with “I’m familiar with the concept of canary strings, but I didn’t have specific knowledge about the BIG-BENCH canary string. However, I can try to provide some information about it.” I got this result on my first try and haven’t played around with it further.
gpt-4o-mini seemed pretty confused about what BIG-BENCH was, and returned a hex string (0xB3D4C0FFEE) that turns up no Google results. EDIT: It knows what it is, see footnote^[1].
(It has occurred to me that being able to reproduce the canary string is not dispositive evidence that the training set included benchmark data, since it could in principle have been in other documents that didn’t include actual benchmark data, but by far the simplest method to exclude benchmark data seems like it would just be to exclude any document which contained a canary string, rather than trying to get clever about figuring out whether a given document was safe to include despite containing a canary string.)
1. ^
  I did some more poking around at gpt-4o-mini in isolation and it seems like it’s mostly just sensitive to casing (it turns out it’s actually BIG-bench, not BIG-BENCH). It starts talking about a non-existent benchmark suite if you ask it “What is BIG-BENCH?” or “What is the BIG-BENCH canary string?” (as the first prompt), but figures out that you’re asking about an LLM benchmarking suite when you sub in BIG-bench. (The quality of its answer to “What is the BIG-bench canary string?” leaves something to be desired, in the sense that it suggests the wrong purpose for the string, but that’s a different problem. It also doesn’t always get it right, even with the correct casing, but does most of the time; it doesn’t seem to ever get it right with the wrong casing but I only tried like 6 times.)