I would like to make a remark that making the most reasonable characters to say that “alignment is unsolvable” in not a very pedagogically-wise move.
That’s a valid point, thank you for making it. However, I painted a pessimistic picture in the novel on purpose. I’m personally deeply skeptical that alignment in a narrow sense—making an uncontrollable AI “provably beneficial”—is even theoretically possible, let alone that we’ll solve it before we’re able to build an uncontrollable AI. That’s not to say that we shouldn’t work on AI alignment with all we have. But I think it’s extremely important that we’re getting a common understanding of the dangers and realize that it would be very stupid to build an AI that we can’t control before we have solved alignment. “Optimism” is not a good strategy with regard to existential risks, and the burden of proof should be on those who try to develop AGI and claim they know how to make it safe.
But even MIRI says that alignment is “incredibly hard”, not “impossible”.
Yes. I’m not saying it is impossible, even though I’m deeply skeptical. That a character in my novel says it’s impossible doesn’t necessarily reflect my own opinion. I guess I’m as optimistic about it as Eliezer Yudkowsky. :( I could go into the details, but it probably doesn’t make sense to discuss this here in the comments. I’m not much of an expert anyway. Still, if someone claims to have solved alignment, I’d like to see a proof. In any case, I’m convinced that it is MUCH easier to prevent an AI-related catastrophe by not developing an uncontrollable AI than by solving alignment, at least in the short term. So what we need now, I think, is more caution, not more optimism. I’d be very, very happy if it turns out that I was overly pessimistic and everything goes well.
That’s a valid point, thank you for making it. However, I painted a pessimistic picture in the novel on purpose. I’m personally deeply skeptical that alignment in a narrow sense—making an uncontrollable AI “provably beneficial”—is even theoretically possible, let alone that we’ll solve it before we’re able to build an uncontrollable AI. That’s not to say that we shouldn’t work on AI alignment with all we have. But I think it’s extremely important that we’re getting a common understanding of the dangers and realize that it would be very stupid to build an AI that we can’t control before we have solved alignment. “Optimism” is not a good strategy with regard to existential risks, and the burden of proof should be on those who try to develop AGI and claim they know how to make it safe.
But even MIRI says that alignment is “incredibly hard”, not “impossible”.
Yes. I’m not saying it is impossible, even though I’m deeply skeptical. That a character in my novel says it’s impossible doesn’t necessarily reflect my own opinion. I guess I’m as optimistic about it as Eliezer Yudkowsky. :( I could go into the details, but it probably doesn’t make sense to discuss this here in the comments. I’m not much of an expert anyway. Still, if someone claims to have solved alignment, I’d like to see a proof. In any case, I’m convinced that it is MUCH easier to prevent an AI-related catastrophe by not developing an uncontrollable AI than by solving alignment, at least in the short term. So what we need now, I think, is more caution, not more optimism. I’d be very, very happy if it turns out that I was overly pessimistic and everything goes well.