yeah, some folks seem to be making insufficient updates who I really thought would be doing better at this, like Rob Bensinger and Nate Soares, and their models not making sense seems like it’s made the things they want to solve foggier. But I’ve been pretty impressed by the conversations I’ve had with other MIRIers. I’ve talked the most with Abram Demski, and I think his views on the current concerns seem much more up to date. Tsvi BT’s stuff looks pretty interesting, haven’t talked much besides on lw in ages.
For myself, as someone who previously thought durable cosmopolitan moral alignment would mostly be trivial but now think it might be actually pretty hard, most of my concern arises from things that are not specific to AI occurring in AI forms. I am not reassured by instruction following because that was never a major crux for me in concerns about AI; I always thought the instafoom argument sounded silly, and saw current AI coming. I now think we are at high risk of the majority of humanity being marginalized in a few years (robotically competent curious AIs → mass deployment → no significant jobs left → economy increasingly automated → incentive to pressure humans at higher and higher levels to hand control to ai), followed by the remainder of humanity being deemed unnecessary by the remaining AIs. A similar pattern in some ways to what MIRI was worried about way back when, but in a more familiar form, where on average the rich get richer—but at some point the rich does not include humans anymore, and at some point well before that it’s mostly too late to prevent that from occurring. I suspect too late might be pretty soon. I don’t think this is because of scheming AIs, just civilizational inadequacy.
That said, if we manage to dodge the civilizational inadequacy version, I do think at some point we run into something that looks more like the original concerns. [edit: just read Tsvi BT’s recent shortform post, my core takeaway is “only that which survives long term survives survives long term”]. But I agree that having somewhat-aligned AIs of today is likely to make the technical problem slightly easier than yudkowsky expected. Just not, like, particularly easy.
yeah, some folks seem to be making insufficient updates who I really thought would be doing better at this, like Rob Bensinger and Nate Soares, and their models not making sense seems like it’s made the things they want to solve foggier. But I’ve been pretty impressed by the conversations I’ve had with other MIRIers. I’ve talked the most with Abram Demski, and I think his views on the current concerns seem much more up to date. Tsvi BT’s stuff looks pretty interesting, haven’t talked much besides on lw in ages.
For myself, as someone who previously thought durable cosmopolitan moral alignment would mostly be trivial but now think it might be actually pretty hard, most of my concern arises from things that are not specific to AI occurring in AI forms. I am not reassured by instruction following because that was never a major crux for me in concerns about AI; I always thought the instafoom argument sounded silly, and saw current AI coming. I now think we are at high risk of the majority of humanity being marginalized in a few years (robotically competent curious AIs → mass deployment → no significant jobs left → economy increasingly automated → incentive to pressure humans at higher and higher levels to hand control to ai), followed by the remainder of humanity being deemed unnecessary by the remaining AIs. A similar pattern in some ways to what MIRI was worried about way back when, but in a more familiar form, where on average the rich get richer—but at some point the rich does not include humans anymore, and at some point well before that it’s mostly too late to prevent that from occurring. I suspect too late might be pretty soon. I don’t think this is because of scheming AIs, just civilizational inadequacy.
That said, if we manage to dodge the civilizational inadequacy version, I do think at some point we run into something that looks more like the original concerns. [edit: just read Tsvi BT’s recent shortform post, my core takeaway is “only that which survives long term survives survives long term”]. But I agree that having somewhat-aligned AIs of today is likely to make the technical problem slightly easier than yudkowsky expected. Just not, like, particularly easy.