Steven Byrnes comments on Why I’m not working on {debate, RRM, ELK, natural abstractions}

Steven Byrnes 10 Feb 2023 20:32 UTC
2 points
0
To me the question “how much can iteration help you?” seems to have a big impact on “What’s the probability that we’ll ultimately succeed at alignment?” but has a much smaller (albeit nonzero) impact on “What technical safety research directions are more or less promising?”. Either way, we should come up with the best plan that we can come up with for how to make aligned AGI, right? Then, insofar as we can iterate on that plan based on meaningful test data, that’s awesome, lucky us, and we should definitely do that.
(“What’s the probability that we’ll succeed at alignment” is also an important question with real-world implications, e.g. on how bad it is to shorten timelines, but it’s not something I’m talking about in this particular post.)