Another idea: Ask the LLM how well it will do on a certain task (for example, which fraction of math problems of type X it will get right), and then actually test it. This a priori lands in INTROSPECTION, but could have a bit of FACTS or ID-LEVERAGE if you use tasks described in training data as “hard for LLMs” (like tasks related to tokens and text position).
This is interesting! Although I think it’s pretty hard to use that in a benchmark (because you need a set of problems assigned to clearly defined types and I’m not aware of any such dataset).
you need a set of problems assigned to clearly defined types and I’m not aware of any such dataset
Hm, I was thinking something as easy to categorize as “multiplying numbers of n digits”, or “the different levels of MMLU” (although again, they already know about MMLU), or “independently do X online (for example create an account somewhere)”, or even some of the tasks from your paper.
I guess I was thinking less about “what facts they know”, which is pure memorization (although this is also interesting), and more about “cognitively hard tasks”, that require some computational steps.
You want to make it clear to the LLM what the task is (multiplying n digit numbers is clear but “doing hard math questions” is vague) and also have some variety of difficulty levels (within LLMs and between LLMs) and a high ceiling. I think this would take some iteration at least.
Another idea: Ask the LLM how well it will do on a certain task (for example, which fraction of math problems of type X it will get right), and then actually test it. This a priori lands in INTROSPECTION, but could have a bit of FACTS or ID-LEVERAGE if you use tasks described in training data as “hard for LLMs” (like tasks related to tokens and text position).
I like this idea. It’s possible something like this already exists but I’m not aware of it.
This is interesting! Although I think it’s pretty hard to use that in a benchmark (because you need a set of problems assigned to clearly defined types and I’m not aware of any such dataset).
There are some papers on “do models know what they know”, e.g. https://arxiv.org/abs/2401.13275 or https://arxiv.org/pdf/2401.17882.
Hm, I was thinking something as easy to categorize as “multiplying numbers of n digits”, or “the different levels of MMLU” (although again, they already know about MMLU), or “independently do X online (for example create an account somewhere)”, or even some of the tasks from your paper.
I guess I was thinking less about “what facts they know”, which is pure memorization (although this is also interesting), and more about “cognitively hard tasks”, that require some computational steps.
You want to make it clear to the LLM what the task is (multiplying n digit numbers is clear but “doing hard math questions” is vague) and also have some variety of difficulty levels (within LLMs and between LLMs) and a high ceiling. I think this would take some iteration at least.