Alexander Gietelink Oldenziel comments on Thomas Kwa’s Shortform

Alexander Gietelink Oldenziel 4 May 2024 8:42 UTC
6 points
0
Interesting...

Wouldn’t I expect the evidence to come out in a few big chunks, e.g. OpenAI releasing a new product?
- Thomas Kwa 4 May 2024 10:49 UTC
  6 points
  2
  Parent
  To some degree yes, but I expect lots of information to be spread out across time. For example: OpenAI releases GPT5 benchmark results. Then a couple weeks later they deploy it on ChatGPT and we can see how subjectively impressive it is out of the box, and whether it is obviously pursuing misaligned goals. Over the next few weeks people develop post-training enhancements like scaffolding, and we get a better sense of its true capabilities. Over the next few months, debate researchers study whether GPT4-judged GPT5 debates reliably produce truth, and control researchers study whether GPT4 can detect whether GPT5 is scheming. A year later an open-weights model of similar capability is released and the interp researchers check how understandable it is and whether SAEs still train.