The lack of reliability eats away a huge amount of productivity. Everything should be double-checked, and with higher capabilities it becomes even harder, and we need to think more about the subtle ways that their output is wrong. Unknown unknowns are also always a factor, but if o3 type models can be trained in less verifiable problems, and not insanely compute heavy, then 2026 is actually a reasonable guess.
The lack of reliability eats away a huge amount of productivity. Everything should be double-checked, and with higher capabilities it becomes even harder, and we need to think more about the subtle ways that their output is wrong. Unknown unknowns are also always a factor, but if o3 type models can be trained in less verifiable problems, and not insanely compute heavy, then 2026 is actually a reasonable guess.