I expect difficulty to grow exponentially with argument length. (Based on stuff like it constantly having to go back and double checking even when it got something right.)
Training of DeepSeek-R1 doesn’t seem to do anything at all to incentivize shorter reasoning traces, so it’s just rechecking again and again because why not. Like if you are taking an important 3 hour written test, and you are done in 1 hour, it’s prudent to spend the remaining 2 hours obsessively verifying everything.
Training of DeepSeek-R1 doesn’t seem to do anything at all to incentivize shorter reasoning traces, so it’s just rechecking again and again because why not. Like if you are taking an important 3 hour written test, and you are done in 1 hour, it’s prudent to spend the remaining 2 hours obsessively verifying everything.