Maybe I’m missing some context, but wouldn’t it be better for Open AI as an organized entity to be destroyed than for it to exist right up to the point where all humans are destroyed by an AGI that is neither benevolent nor “aligned with humanity” (if we are somehow so objectively bad as to deserve care by a benevolent powerful and very smart entity).
The problem I suspect is that people just can’t get out of the typical “FOR THE SHAREHOLDERS” mindset, so a company that is literally willing to commit suicide rather than getting hijacked for purposes antithetic to its mission, like a cell dying by apoptosis rather than going cancerous, can be a very good thing, and if only there was more of this. You can’t beat Moloch if you’re not willing to precommit to this sort of action. And let’s face it, no one involved here is facing homelessness and soup kitchens even if Open AI crashes tomorrow. They’ll be a little worse off for a while, their careers will take a hit, and then they’ll pick themselves up. If this was about the safety of humanity it would be a no-brainer that you should be ready to sacrifice that much.
I feel like, not unlike the situation with SBF and FTX, the delusion that OpenAI could possibly avoid this trap maps on the same cognitive weak spot among EA/rationalists of “just let me slip on the Ring of Power this once bro, I swear it’s just for a little while bro, I’ll take it off before Moloch turns me into his Nazgul, trust me bro, just this once”.
This is honestly entirely unsurprising. Rivers flow downhill and companies part of a capitalist economy producing stuff with tremendous potential economic value converge on making a profit.
The corporate structure of OpenAI was set up as an answer to concerns (about AGI and control over AGIs) which were raised by rationalists. But I don’t think rationalists believed that this structure was a sufficient solution to the problem, anymore than non-rationalists believed it. The rationalists that I have been speaking to were generally mostly sceptical about OpenAI.
Oh, I mean, sure, scepticism about OpenAI was already widespread, no question. But in general it seems to me like there’s been too many attempts to be too clever by half from people at least adjacent in ways of thinking to rationalism/EA (like Elon) that go “I want to avoid X-risk but also develop aligned friendly AGI for myself” and the result is almost invariably that it just advances capabilities more than safety. I just think sometimes there’s a tendency to underestimate the pull of incentives and how you often can’t just have your cake and eat it. I remain convinced that if one wants to avoid X-risk from AGI the safest road is probably to just strongly advocate for not building AGI, and putting it in the same bin as “human cloning” as a fundamentally unethical technology. It’s not a great shot, but it’s probably the best one at stopping it. Being wishy-washy doesn’t pay off.
I think you’re in the majority in this opinion around here. I am noticing I’m confused about the lack of enthusiasm for developing alignment methods for thetypes of AGI that are being developed. Trying to get people to stop building it would be ideal, but I don’t see a path to it. The actual difficulty of alignment seems mostly unknown, so potentially vastly more tractable. Yet such efforts make up a tiny part of x-risk discussion.
This isn’t an argument for building ago, but for aligning the specific AGI others build.
Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.
In the cancer case, the human body has every cell begin aligned with the body. Anthropically this has to function until breeding age plus enough offspring to beat losses.
And yes, if faulty cells self destruct instead of continuing this is good, there are cancer treatments that try to gene edit in clean copies of specific genes (p51 as I recall) that mediate this (works in rats...).
However the corporate world/international competition world has many more actors and they are adversarial. OAI self destructing leaves the world’s best AI researchers unemployed, removes them from competing in the next round of model improvements—whoever makes a gpt-5 at a competitor will have the best model outright.
Coordination is hard. Consider the consequences if an entire town decided to stop consuming fossil fuels. They pay the extra costs and rebuild the town to be less car dependent.
However the consequence is this lowers the market price of fossil fuels. So others use more. (Demand elasticity makes the effect still slightly positive)
I mean, yes, a company self-destructing doesn’t stop much if their knowledge isn’t also actively deleted—and even then, it’s just a setback of a few months. But also, by going “oh well we need to work inside the system to fix it somehow” at some point all you get is just another company racing with all others (and in this case, effectively being a pace setter). However you put it, OpenAI is more responsible than any other company for how close we may be to AGI right now, and despite their stated mission, I suspect they did not advance safety nearly as much as capability. So in the end, from the X-risk viewpoint, they mostly made things worse.
The problem I suspect is that people just can’t get out of the typical “FOR THE SHAREHOLDERS” mindset, so a company that is literally willing to commit suicide rather than getting hijacked for purposes antithetic to its mission, like a cell dying by apoptosis rather than going cancerous, can be a very good thing, and if only there was more of this. You can’t beat Moloch if you’re not willing to precommit to this sort of action. And let’s face it, no one involved here is facing homelessness and soup kitchens even if Open AI crashes tomorrow. They’ll be a little worse off for a while, their careers will take a hit, and then they’ll pick themselves up. If this was about the safety of humanity it would be a no-brainer that you should be ready to sacrifice that much.
Sam’s latest tweet suggests he can’t get out of the “FOR THE SHAREHOLDERS” mindset.
“satya and my top priority remains to ensure openai continues to thrive
we are committed to fully providing continuity of operations to our partners and customers”
This does sound antithetical to the charter and might be grounds to replace Sam as CEO.
I feel like, not unlike the situation with SBF and FTX, the delusion that OpenAI could possibly avoid this trap maps on the same cognitive weak spot among EA/rationalists of “just let me slip on the Ring of Power this once bro, I swear it’s just for a little while bro, I’ll take it off before Moloch turns me into his Nazgul, trust me bro, just this once”.
This is honestly entirely unsurprising. Rivers flow downhill and companies part of a capitalist economy producing stuff with tremendous potential economic value converge on making a profit.
The corporate structure of OpenAI was set up as an answer to concerns (about AGI and control over AGIs) which were raised by rationalists. But I don’t think rationalists believed that this structure was a sufficient solution to the problem, anymore than non-rationalists believed it. The rationalists that I have been speaking to were generally mostly sceptical about OpenAI.
Oh, I mean, sure, scepticism about OpenAI was already widespread, no question. But in general it seems to me like there’s been too many attempts to be too clever by half from people at least adjacent in ways of thinking to rationalism/EA (like Elon) that go “I want to avoid X-risk but also develop aligned friendly AGI for myself” and the result is almost invariably that it just advances capabilities more than safety. I just think sometimes there’s a tendency to underestimate the pull of incentives and how you often can’t just have your cake and eat it. I remain convinced that if one wants to avoid X-risk from AGI the safest road is probably to just strongly advocate for not building AGI, and putting it in the same bin as “human cloning” as a fundamentally unethical technology. It’s not a great shot, but it’s probably the best one at stopping it. Being wishy-washy doesn’t pay off.
I think you’re in the majority in this opinion around here. I am noticing I’m confused about the lack of enthusiasm for developing alignment methods for thetypes of AGI that are being developed. Trying to get people to stop building it would be ideal, but I don’t see a path to it. The actual difficulty of alignment seems mostly unknown, so potentially vastly more tractable. Yet such efforts make up a tiny part of x-risk discussion.
This isn’t an argument for building ago, but for aligning the specific AGI others build.
Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
That all makes sense.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.
In the cancer case, the human body has every cell begin aligned with the body. Anthropically this has to function until breeding age plus enough offspring to beat losses.
And yes, if faulty cells self destruct instead of continuing this is good, there are cancer treatments that try to gene edit in clean copies of specific genes (p51 as I recall) that mediate this (works in rats...).
However the corporate world/international competition world has many more actors and they are adversarial. OAI self destructing leaves the world’s best AI researchers unemployed, removes them from competing in the next round of model improvements—whoever makes a gpt-5 at a competitor will have the best model outright.
Coordination is hard. Consider the consequences if an entire town decided to stop consuming fossil fuels. They pay the extra costs and rebuild the town to be less car dependent.
However the consequence is this lowers the market price of fossil fuels. So others use more. (Demand elasticity makes the effect still slightly positive)
I mean, yes, a company self-destructing doesn’t stop much if their knowledge isn’t also actively deleted—and even then, it’s just a setback of a few months. But also, by going “oh well we need to work inside the system to fix it somehow” at some point all you get is just another company racing with all others (and in this case, effectively being a pace setter). However you put it, OpenAI is more responsible than any other company for how close we may be to AGI right now, and despite their stated mission, I suspect they did not advance safety nearly as much as capability. So in the end, from the X-risk viewpoint, they mostly made things worse.