I understand the two gaps the way Rohin described them. The two problems listed above don’t seem to be implementation challenges, they seem like ways in which our theoretic-best-case alignment strategies can’t keep up. If the capabilities-optimal ML paradigm is one not amenable to safety, that’s a problem which primarily restricts the upper bound on our alignment proposals (they must operate under other, worse, paradigms), rather than a theory-practice gap.
I am confused by the examples you use for sources of the theory-practice gap. Problems with the structure of the recursion and NP-hard problems seem much more like the first gap.
I understand the two gaps the way Rohin described them. The two problems listed above don’t seem to be implementation challenges, they seem like ways in which our theoretic-best-case alignment strategies can’t keep up. If the capabilities-optimal ML paradigm is one not amenable to safety, that’s a problem which primarily restricts the upper bound on our alignment proposals (they must operate under other, worse, paradigms), rather than a theory-practice gap.