justinpombrio comments on Visible Thoughts Project and Bounty Announcement

justinpombrio 30 Nov 2021 19:08 UTC
11 points
I have an idea for testing this approach, before getting authors to write tens of thousands of pages of annotated dungeon tests.

It’s hard to generate explanations of prose, but easy, for a computer, to generate explanations of particular subsets of math. For example, WolframAlpha can explain its reasoning for finding the derivative of a polynomial (click “step by step solution”, then “show all steps”): Wolfram Alpha derivative example

There’s a wide variety of math problems which we can programmatically solve, and can therefore programmatically generate explanations for:
- Arithmetic, like step-by-step long division
- Derivatives over a large set of operations (but not integrals; those are harder)
- Subsets of logic
- Subsets of integer programming
- Some varieties of logic puzzles, like “knights and knaves” and “Alice, Beth, and Cara live in houses 1, 2, and 3, and have favorite colors Red, Green, and Blue non-respectively; here are some clues to figure out which is which”.
- Simple algebra, like multiplying polynomials
(Actually, most of these are probably too hard to learn. Should focus on the really simple ones like long division.)

The idea is to:
1. Programmatically generate a large quantity of a small variety of math problems with explanations; then
2. Train one transformer on just the problem and final answer; and
3. Train another transformer on the problem, explanation, and final answer.
This is a very different domain than English prose, so it won’t tell you anything definitive about that more important domain. But it’s easier to do, and it shouldn’t carry any risk of advancing AI capabilities, since the training set is already (by definition) something we can already solve more accurately by other means.

I imagine you could learn a few things about how the explanations influence the AI:
- You can see whether the explanation helps teach the AI, by checking whether the second transformer outperforms the first.
- You can see whether the AI actually “uses” the explanation, by looking at the pattern of mistakes. If the AI frequently bungles the explanation while writing down the correct final answer, it must be generating the explanation and answer separately. This would be a bad sign for “visible thought” alignment.
- You can see whether the AI naturally “hides” mistakes in its reasoning. I wouldn’t be surprised to frequently see a chain of reasoning “A → B → C → D → E → F”, where A, B, E, and F are right and C and D are wrong, since it’s often easier to check the beginning and end of a proof. For example, students do this sometimes.
- gabrielrecc 30 Nov 2021 22:10 UTC
  3 points
  Parent
  Relevant: From OpenAI’s “Training Verifiers To Solve Math Word Problems”: “We also note that it is important to allow the model to generate the full natural language solution before outputting a final answer. If we instead finetune a 6B model to directly output the final answer without any intermediate steps, performance drops drastically from 20.6% to 5.2%.” Also the “exploration” linked in the post, as well as my own little exploration restricted to modulo operations on many-digit numbers (via step-by-step long division!), on which LMs do very poorly without generating intermediate steps. (But see also Hendryks et al: “We also experiment with using step-by-step solutions. We find that having models generate their own step-by-step solutions before producing an answer actually degrades accuracy. We qualitatively assess these generated solutions and find that while many steps remain illogical, they are often related to the question. Finally, we show that step-by-step solutions can still provide benefits today. We find that providing partial ground truth step-by-step solutions can improve performance, and that providing models with step-by-step solutions at training time also increases accuracy.”)