This would mean that a hypothetical AI “uniformly” gaining capability on all axes would beat us at math long before it beats us at deception.
I’m pretty skeptical of this as an assumption.
If you want an AI to output a useful design for an aligned AI, that design has to be secure, because an aligned-but-insecure AI is not stably aligned, it could be hacked. Ergo, your oracle AI must be using a security mindset at superhuman levels of intelligence. Otherwise the textbook you’ll get out will be beautiful, logical, coherent, and insecure. I don’t see how you could make an AI which has that level of security mindset and isn’t superhumanly capable of deception.
I’d be interested to know how many people flunked out of that internship because they couldn’t pick it up, and to what extent people were pre-selected for the internship based on some estimate of their ability to pick it up.