Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.
Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
That all makes sense.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.