A Ray comments on Alex Ray’s Shortform

A Ray 5 Aug 2022 19:08 UTC
LW: 12 AF: 7
AF
I think there should be a norm about adding the big-bench canary string to any document describing AI evaluations in detail, where you wouldn’t want it to be inside that AI’s training data.
Maybe in the future we’ll have a better tag for “dont train on me”, but for now the big bench canary string is the best we have.
This is in addition to things like “maybe don’t post it to the public internet” or “maybe don’t link to it from public posts” or other ways of ensuring it doesn’t end up in training corpora.
I think this is a situation for defense-in-depth.
- Daniel Kokotajlo 30 Aug 2022 5:27 UTC
  LW: 2 AF: 2
  AF Parent
  What is the canary exactly? I’d like to have a handy reference to copy-paste that I can point people to. Google fails me.