+1 here for the idea around how the models must commit to a URL once it starts, and that it can’t naturally cut off after starting. Presumably though the aspiration is that these reasoning/CoT-trained models could reflect back on the just completed URL and guess whether that is likely to be a real URL or not. If it’s not doing this check step, this might be a gap in the learned skills, more than intentional deception.
+1 here for the idea around how the models must commit to a URL once it starts, and that it can’t naturally cut off after starting. Presumably though the aspiration is that these reasoning/CoT-trained models could reflect back on the just completed URL and guess whether that is likely to be a real URL or not. If it’s not doing this check step, this might be a gap in the learned skills, more than intentional deception.