We think it’s very unlikely that the AI alignment field will be able to make progress quickly enough to prevent human extinction.
From my point of view it seems possible that we could solve technical alignment, or otherwise become substantially deconfused, within 10 years—perhaps much sooner. I don’t think we’ve ruled out that foundational scientific progress is capable of solving the problem, nor that cognitively unenhanced humans might be able to succeed at such an activity. Like, as far as I can tell, very few people have tried working on the problem directly, in the sense of forming original lines of attack (maybe between 10 to 20?), many of whom share similar ontological backgrounds. This doesn’t seem like overwhelming evidence, to me, that the situation is doomed.
For instance, in the late 1800’s physicists were so ontologically committed to the idea of absolute rest that they spent many decades searching for ether, instead of discovering special relativity. Even Lorentz and Poincaré, who both had many of the key ideas for special relativity, never made the final leap—even after Einstein’s publication—because they were so committed to their traditional notions of space and time. If everyone within a field is operating under the same incorrect ontological assumptions it can seem like progress is impossible, when in fact progress is just hard when you have the wrong concepts.
Also, conceptual progress can happen quickly. I don’t think it necessarily looks (to most of the outside world) like people are making steady progress towards deconfusion. I think it often looks more like “that person is doing some weird thing over there” until they present an inferential-distance-crossing work and it clicks for the rest of the world. At least, this is roughly what happened for Einstein (with his “miracle year”—introducing special relativity, light quanta, etc.), Newton (with his “year of wonders”—theory of gravity, calculus, and many insights into optics), and Darwin (with Origin of Species).
In other words, I don’t think modeling the current landscape as dire is that much evidence that it will remain so for years to come. Things look confusing and hard until they’re not; and historically speaking, great scientists have sometimes been able to make great conceptual progress on seemingly difficult problems—often suddenly, and unexpectedly.
I consider the Sequences to be one of the greatest philosophical texts of the century to date, but while it would be hard to explain in a few sentences, I also think that it got some key ontological commitments wrong. In any case, I worry that MIRI is over-anchoring on their ontology being the correct one, and then concluding that further efforts are doomed. Whereas I strongly suspect there’s room for philosophical work to bear unexpected (and fast) alignment progress. Especially so, given the amount of ontological correlated-ness among the few people who have tried to figure out alignment so far.
I think one way to cause conceptual progress to happen faster is just to have more people working on the problem in more ontologically decorrelated ways. Because of that, I personally feel worried about what seems to me like an increasing push towards policy work or towards already developed agendas. Not that working on either of those is necessarily bad—many such efforts strike me as important bets to make, and I’m deeply grateful that people are pursuing them. Just that, on the margin, I think we ought to be allotting more of our portfolio to people that are developing their own angles on the problem.
I really want our culture to support minds who take on the strange, difficult, and vulnerable task of trying to make scientific progress at the frontier of human knowledge. And I don’t want to lose sight of that, or for the miasma of generalized hopelessness to make people less likely to try it.
From my point of view it seems possible that we could solve technical alignment, or otherwise become substantially deconfused, within 10 years—perhaps much sooner. I don’t think we’ve ruled out that foundational scientific progress is capable of solving the problem, nor that cognitively unenhanced humans might be able to succeed at such an activity. Like, as far as I can tell, very few people have tried working on the problem directly, in the sense of forming original lines of attack (maybe between 10 to 20?), many of whom share similar ontological backgrounds. This doesn’t seem like overwhelming evidence, to me, that the situation is doomed.
For instance, in the late 1800’s physicists were so ontologically committed to the idea of absolute rest that they spent many decades searching for ether, instead of discovering special relativity. Even Lorentz and Poincaré, who both had many of the key ideas for special relativity, never made the final leap—even after Einstein’s publication—because they were so committed to their traditional notions of space and time. If everyone within a field is operating under the same incorrect ontological assumptions it can seem like progress is impossible, when in fact progress is just hard when you have the wrong concepts.
Also, conceptual progress can happen quickly. I don’t think it necessarily looks (to most of the outside world) like people are making steady progress towards deconfusion. I think it often looks more like “that person is doing some weird thing over there” until they present an inferential-distance-crossing work and it clicks for the rest of the world. At least, this is roughly what happened for Einstein (with his “miracle year”—introducing special relativity, light quanta, etc.), Newton (with his “year of wonders”—theory of gravity, calculus, and many insights into optics), and Darwin (with Origin of Species).
In other words, I don’t think modeling the current landscape as dire is that much evidence that it will remain so for years to come. Things look confusing and hard until they’re not; and historically speaking, great scientists have sometimes been able to make great conceptual progress on seemingly difficult problems—often suddenly, and unexpectedly.
I consider the Sequences to be one of the greatest philosophical texts of the century to date, but while it would be hard to explain in a few sentences, I also think that it got some key ontological commitments wrong. In any case, I worry that MIRI is over-anchoring on their ontology being the correct one, and then concluding that further efforts are doomed. Whereas I strongly suspect there’s room for philosophical work to bear unexpected (and fast) alignment progress. Especially so, given the amount of ontological correlated-ness among the few people who have tried to figure out alignment so far.
I think one way to cause conceptual progress to happen faster is just to have more people working on the problem in more ontologically decorrelated ways. Because of that, I personally feel worried about what seems to me like an increasing push towards policy work or towards already developed agendas. Not that working on either of those is necessarily bad—many such efforts strike me as important bets to make, and I’m deeply grateful that people are pursuing them. Just that, on the margin, I think we ought to be allotting more of our portfolio to people that are developing their own angles on the problem.
I really want our culture to support minds who take on the strange, difficult, and vulnerable task of trying to make scientific progress at the frontier of human knowledge. And I don’t want to lose sight of that, or for the miasma of generalized hopelessness to make people less likely to try it.
I think this a great comment, and FWIW I agree with, or am at least sympathetic to, most of it.