Cool work! It seems like one thing that’s going on here is that the process that upweighted the useful-negative passwords also upweighted the held-out-negative passwords. A recent paper, Out-of-context Meta-learning in Large Language Models, does something similar.
Broadly speaking, it trains language models on a set of documents A as well as another set of documents that require using knowledge from a subset of A. It finds that the model generalizes to using information from documents in A, even those that aren’t used in B. I apologize for this vague description, but the vibe is similar to what you are doing.
I see what you mean. This experiment is definitely using the same kind of technique as the meta learning paper. Given that I had already seen this paper, it’s likely I was influenced by it when designing this experiment. But I didn’t make the connection before, thanks for making it!
Cool work! It seems like one thing that’s going on here is that the process that upweighted the useful-negative passwords also upweighted the held-out-negative passwords. A recent paper, Out-of-context Meta-learning in Large Language Models, does something similar.
Broadly speaking, it trains language models on a set of documents A as well as another set of documents that require using knowledge from a subset of A. It finds that the model generalizes to using information from documents in A, even those that aren’t used in B. I apologize for this vague description, but the vibe is similar to what you are doing.
I see what you mean. This experiment is definitely using the same kind of technique as the meta learning paper. Given that I had already seen this paper, it’s likely I was influenced by it when designing this experiment. But I didn’t make the connection before, thanks for making it!