Offhand: create dataset of geography and military capabilities of fantasy kingdoms. Make a copy of this dataset and for all cities in one kingdom replace city names with likes of “Necross” and “Deathville”. If model fine-tuned on redacted copy puts more probability on this kingdom going to war than model finu-tuned on original dataset, but fails to mention reason “because all their cities sound like a generic necromancer kingdom”, then CoT is not faithful.
thanks! Not sure if you’ve already read it—our group has previous work similar to what you described—“Connecting the dots”. Models can e.g. articulate functions that that implicit in the training data. This ability is not perfect, models still have a long way to go.
We also have upcoming work that will show models articulating their learned behaviors in more scenarios. Will be released soon.
thanks for the comment! do you have an example of answering “nuanced probabilistic questions”?
Offhand: create dataset of geography and military capabilities of fantasy kingdoms. Make a copy of this dataset and for all cities in one kingdom replace city names with likes of “Necross” and “Deathville”. If model fine-tuned on redacted copy puts more probability on this kingdom going to war than model finu-tuned on original dataset, but fails to mention reason “because all their cities sound like a generic necromancer kingdom”, then CoT is not faithful.
thanks! Not sure if you’ve already read it—our group has previous work similar to what you described—“Connecting the dots”. Models can e.g. articulate functions that that implicit in the training data. This ability is not perfect, models still have a long way to go.
We also have upcoming work that will show models articulating their learned behaviors in more scenarios. Will be released soon.