The corporate structure of OpenAI was set up as an answer to concerns (about AGI and control over AGIs) which were raised by rationalists. But I don’t think rationalists believed that this structure was a sufficient solution to the problem, anymore than non-rationalists believed it. The rationalists that I have been speaking to were generally mostly sceptical about OpenAI.
Oh, I mean, sure, scepticism about OpenAI was already widespread, no question. But in general it seems to me like there’s been too many attempts to be too clever by half from people at least adjacent in ways of thinking to rationalism/EA (like Elon) that go “I want to avoid X-risk but also develop aligned friendly AGI for myself” and the result is almost invariably that it just advances capabilities more than safety. I just think sometimes there’s a tendency to underestimate the pull of incentives and how you often can’t just have your cake and eat it. I remain convinced that if one wants to avoid X-risk from AGI the safest road is probably to just strongly advocate for not building AGI, and putting it in the same bin as “human cloning” as a fundamentally unethical technology. It’s not a great shot, but it’s probably the best one at stopping it. Being wishy-washy doesn’t pay off.
I think you’re in the majority in this opinion around here. I am noticing I’m confused about the lack of enthusiasm for developing alignment methods for thetypes of AGI that are being developed. Trying to get people to stop building it would be ideal, but I don’t see a path to it. The actual difficulty of alignment seems mostly unknown, so potentially vastly more tractable. Yet such efforts make up a tiny part of x-risk discussion.
This isn’t an argument for building ago, but for aligning the specific AGI others build.
Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.
The corporate structure of OpenAI was set up as an answer to concerns (about AGI and control over AGIs) which were raised by rationalists. But I don’t think rationalists believed that this structure was a sufficient solution to the problem, anymore than non-rationalists believed it. The rationalists that I have been speaking to were generally mostly sceptical about OpenAI.
Oh, I mean, sure, scepticism about OpenAI was already widespread, no question. But in general it seems to me like there’s been too many attempts to be too clever by half from people at least adjacent in ways of thinking to rationalism/EA (like Elon) that go “I want to avoid X-risk but also develop aligned friendly AGI for myself” and the result is almost invariably that it just advances capabilities more than safety. I just think sometimes there’s a tendency to underestimate the pull of incentives and how you often can’t just have your cake and eat it. I remain convinced that if one wants to avoid X-risk from AGI the safest road is probably to just strongly advocate for not building AGI, and putting it in the same bin as “human cloning” as a fundamentally unethical technology. It’s not a great shot, but it’s probably the best one at stopping it. Being wishy-washy doesn’t pay off.
I think you’re in the majority in this opinion around here. I am noticing I’m confused about the lack of enthusiasm for developing alignment methods for thetypes of AGI that are being developed. Trying to get people to stop building it would be ideal, but I don’t see a path to it. The actual difficulty of alignment seems mostly unknown, so potentially vastly more tractable. Yet such efforts make up a tiny part of x-risk discussion.
This isn’t an argument for building ago, but for aligning the specific AGI others build.
Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
That all makes sense.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.