I feel somewhat conflicted about this post. I think a lot of the points are essentially true. For instance, I think it would be good if timelines could be longer all else equal. I also would love more coordination between AI labs. I also would like more people in AI labs to start paying attention to AI safety.
But I don’t really like the bottom line. The point of all of the above is not reducible to just “getting more time for the problem to be solved.”
First of all, the framing of “solving the problem” is, in my view, misplaced. Unless you think we will someday have a proof of beneficial AI (I think that’s highly unlikely), then there will always be more to do to increase certainty and reliability. There isn’t a moment when the problem is solved.
Second, these interventions are presented as a way of giving “alignment researchers” more time to make technical progress. But in my view, things like more coordination could lead to labs actually adopting any alignment proposals at all. The same is the case for racing. In terms of concern about AI safety, I’d expect labs would actually devote resources to this themselves if they are concerned. This shouldn’t be a side benefit, it should be a mainline benefit.
I don’t think high levels of reliability of beneficial AI will come purely from people who post on LessWrong, because the community is just so small and doesn’t have that much capital behind it. DeepMind/OpenAI, not to mention Google Brain and Meta AI research, could invest significantly more in safety than they do. So could governments (yes, the way they do this might be bad—but it could in principle be good).
You say that you thought buying time was the most important frame you found to backchain with. To me this illustrates a problem with backchaining. Dan Hendrycks and I discussed similar kinds of interventions, and we called this “improving contributing factors” which is what it’s called in complex systems theory. In my view, it’s a much better and less reductive frame for thinking about these kinds of interventions.
There will always be more to do to increase certainty and reliability
I’m confused why this is an objection. I agree that the authors should be specific about what it means to “solve the problem,” but all they need is a definition like “<10% chance of AI killing >1 billion people within 5 years of the development of AGI.”
I think if they operationalized it like that, fine, but I would find the frame “solving the problem” to be a very weird way of referring to that. Usually, when I hear people saying “solving the problem” they have a vague sense of what they are meaning, and have implicitly abstracted away the fact that there are many continuous problems where progress needs to be made and that the problem can only really be reduced, but never solved, unless there is actually a mathematical proof.
I feel somewhat conflicted about this post. I think a lot of the points are essentially true. For instance, I think it would be good if timelines could be longer all else equal. I also would love more coordination between AI labs. I also would like more people in AI labs to start paying attention to AI safety.
But I don’t really like the bottom line. The point of all of the above is not reducible to just “getting more time for the problem to be solved.”
First of all, the framing of “solving the problem” is, in my view, misplaced. Unless you think we will someday have a proof of beneficial AI (I think that’s highly unlikely), then there will always be more to do to increase certainty and reliability. There isn’t a moment when the problem is solved.
Second, these interventions are presented as a way of giving “alignment researchers” more time to make technical progress. But in my view, things like more coordination could lead to labs actually adopting any alignment proposals at all. The same is the case for racing. In terms of concern about AI safety, I’d expect labs would actually devote resources to this themselves if they are concerned. This shouldn’t be a side benefit, it should be a mainline benefit.
I don’t think high levels of reliability of beneficial AI will come purely from people who post on LessWrong, because the community is just so small and doesn’t have that much capital behind it. DeepMind/OpenAI, not to mention Google Brain and Meta AI research, could invest significantly more in safety than they do. So could governments (yes, the way they do this might be bad—but it could in principle be good).
You say that you thought buying time was the most important frame you found to backchain with. To me this illustrates a problem with backchaining. Dan Hendrycks and I discussed similar kinds of interventions, and we called this “improving contributing factors” which is what it’s called in complex systems theory. In my view, it’s a much better and less reductive frame for thinking about these kinds of interventions.
I’m confused why this is an objection. I agree that the authors should be specific about what it means to “solve the problem,” but all they need is a definition like “<10% chance of AI killing >1 billion people within 5 years of the development of AGI.”
I think if they operationalized it like that, fine, but I would find the frame “solving the problem” to be a very weird way of referring to that. Usually, when I hear people saying “solving the problem” they have a vague sense of what they are meaning, and have implicitly abstracted away the fact that there are many continuous problems where progress needs to be made and that the problem can only really be reduced, but never solved, unless there is actually a mathematical proof.