LLMs are already moderately-superhuman at the task of predicting next tokens. This isn’t sufficient to help solve alignment problems. We would need them to meet the much much higher bar of being moderately-superhuman at the general task of science/engineering.
We also need the assumption—which is definitely not obvious—that significant intelligence increases are relatively close to achievable. Superhumanly strong math skills presumably don’t let AI solve NP problems in P time, and it’s similarly plausible—though far from certain—that really good engineering skill tops out somewhere only moderately above human ability due to instrinsic difficulty, and really good deception skills top out somewhere not enough to subvert the best systems that we could build to do oversight and detect misalignment. (On the other hand, even with these objections being correct, it would only show that control is possible, not that it is likely to occur.)
We also need the assumption—which is definitely not obvious—that significant intelligence increases are relatively close to achievable. Superhumanly strong math skills presumably don’t let AI solve NP problems in P time, and it’s similarly plausible—though far from certain—that really good engineering skill tops out somewhere only moderately above human ability due to instrinsic difficulty, and really good deception skills top out somewhere not enough to subvert the best systems that we could build to do oversight and detect misalignment. (On the other hand, even with these objections being correct, it would only show that control is possible, not that it is likely to occur.)