To me the question “how much can iteration help you?” seems to have a big impact on “What’s the probability that we’ll ultimately succeed at alignment?” but has a much smaller (albeit nonzero) impact on “What technical safety research directions are more or less promising?”. Either way, we should come up with the best plan that we can come up with for how to make aligned AGI, right? Then, insofar as we can iterate on that plan based on meaningful test data, that’s awesome, lucky us, and we should definitely do that.
(“What’s the probability that we’ll succeed at alignment” is also an important question with real-world implications, e.g. on how bad it is to shorten timelines, but it’s not something I’m talking about in this particular post.)
To me the question “how much can iteration help you?” seems to have a big impact on “What’s the probability that we’ll ultimately succeed at alignment?” but has a much smaller (albeit nonzero) impact on “What technical safety research directions are more or less promising?”. Either way, we should come up with the best plan that we can come up with for how to make aligned AGI, right? Then, insofar as we can iterate on that plan based on meaningful test data, that’s awesome, lucky us, and we should definitely do that.
(“What’s the probability that we’ll succeed at alignment” is also an important question with real-world implications, e.g. on how bad it is to shorten timelines, but it’s not something I’m talking about in this particular post.)