Paul, it might be helpful to clarify the sort of things you think your approach relies upon in regards to bounds on the amount of overhead (training time, human sample complexity), or the amount of overhead that would doom your approach. If I recall correctly, I think you’ve wanted the approach to have some reasonable constant overhead relative to an unaligned system, though I can’t find the post at the moment? It might also be helpful to have bounds, or at least your guesses on the magnitude of numbers related to individual components (ie. the rough numbers in the Universality and Security amplification post).
I’m aiming for sublinear overhead (so the proportional overhead falls to 0 as the AI becomes more complex). If you told me that overhead was a constant, like 1x or 10x the cost of the unaligned AI, that would make me pessimistic about the approach (with the degree of pessimism depending on the particular constant). It wouldn’t be doomed per se but it would qualify for winning the prize. If you told me that the overhead grew faster than the cost of the unaligned AI, I’d consider that doom.
Paul, it might be helpful to clarify the sort of things you think your approach relies upon in regards to bounds on the amount of overhead (training time, human sample complexity), or the amount of overhead that would doom your approach. If I recall correctly, I think you’ve wanted the approach to have some reasonable constant overhead relative to an unaligned system, though I can’t find the post at the moment? It might also be helpful to have bounds, or at least your guesses on the magnitude of numbers related to individual components (ie. the rough numbers in the Universality and Security amplification post).
I’m aiming for sublinear overhead (so the proportional overhead falls to 0 as the AI becomes more complex). If you told me that overhead was a constant, like 1x or 10x the cost of the unaligned AI, that would make me pessimistic about the approach (with the degree of pessimism depending on the particular constant). It wouldn’t be doomed per se but it would qualify for winning the prize. If you told me that the overhead grew faster than the cost of the unaligned AI, I’d consider that doom.