It’s also noteworthy that people are reporting that there seem like there are other blatant confabulations in the o1 chains, much more so than simply making up a plausible URL, based on the summaries: https://www.reddit.com/r/PromptEngineering/comments/1fj6h13/hallucinations_in_o1preview_reasoning/ Stuff which makes no sense in context and just comes out of nowhere. (And since confabulation seems to be pretty minimal in summarization tasks these days—when I find issues in summaries, it’s usually omitting important stuff rather than making up wildly spurious stuff—I expect those confabulations were not introduced by the summarizer, but were indeed present in the original chain as summarized.)
If you distrust OA’s selection, it seems like o1 is occasionally leaking the chains of thought: https://www.reddit.com/r/OpenAI/comments/1fxa6d6/two_purported_instances_of_o1preview_and_o1mini/ So you can cross-reference those to see if OA’s choices seem censored somehow, and also just look at those as additional data.
It’s also noteworthy that people are reporting that there seem like there are other blatant confabulations in the o1 chains, much more so than simply making up a plausible URL, based on the summaries: https://www.reddit.com/r/PromptEngineering/comments/1fj6h13/hallucinations_in_o1preview_reasoning/ Stuff which makes no sense in context and just comes out of nowhere. (And since confabulation seems to be pretty minimal in summarization tasks these days—when I find issues in summaries, it’s usually omitting important stuff rather than making up wildly spurious stuff—I expect those confabulations were not introduced by the summarizer, but were indeed present in the original chain as summarized.)