The s1 paper introduces a trick of replacing the end-of-thinking token with string “Wait”, which enables continuing to generate a reasoning trace that is as long as you need even when the model itself can’t control this well (“budget forcing”, see Figure 3 in section 3.1).
The s1 paper introduces a trick of replacing the end-of-thinking token with string “Wait”, which enables continuing to generate a reasoning trace that is as long as you need even when the model itself can’t control this well (“budget forcing”, see Figure 3 in section 3.1).