When I read this part of the letter, the authors seem to be throwing it in the face of the board like it is a damning accusation, but actually, as I read it, it seems very prudent and speaks well for the board.
You also informed the leadership team that allowing the company to be destroyed “would be consistent with the mission.”
Maybe I’m missing some context, but wouldn’t it be better for Open AI as an organized entity to be destroyed than for it to exist right up to the point where all humans are destroyed by an AGI that is neither benevolent nor “aligned with humanity” (if we are somehow so objectively bad as to not deserve care by a benevolent powerful and very smart entity).
This reminds me a lot of a blockchain project I served as an ethicist, which was initially a “project” that was interested in advancing a “movement” and ended up with a bunch of people whose only real goal was to cash big paychecks for a long time (at which point I handled my residual duties to the best of my ability and resigned, with lots of people expressing extreme confusion and asking why I was acting “foolishly” or “incompetently” (except for a tiny number who got angry at me for not causing a BIGGER explosion than just leaving to let a normally venal company be normally venal without me)).
In my case, I had very little formal power. I bitterly regretted not having insisted “as the ethicist” in having a right to be informed of any board meeting >=36 hours in advance, and to attend every one of them, and to have the right to speak at them.
(Maybe it is a continuing flaw of “not thinking I need POWER”, to say that I retrospectively should have had a vote on the Board? But I still don’t actually think I needed a vote. Most of my job was to keep saying things like “lying is bad” or “stealing is wrong” or “fairness is hard to calculate but bad to violate if clear violations of it are occurring” or “we shouldn’t proactively serve states that run gulags, we should prepare defenses, such that they respect us enough to explicitly request compliance first”. You know, the obvious stuff, that people only flinch from endorsing because a small part of each one of us, as a human, is a very narrowly selfish coward by default, and it is normal for us, as humans, to need reminders of context sometimes when we get so much tunnel vision during dramatic moments that we might commit regrettable evils through mere negligence.)
No one ever said that it is narrowly selfishly fun or profitable to be in Gethsemane and say “yes to experiencing pain if the other side who I care about doesn’t also press the ‘cooperate’ button”.
But to have “you said that ending up on the cross was consistent with being a moral leader of a moral organization!” flung on one’s face as an accusation suggests to me that the people making the accusation don’t actually understand that sometimes objective de re altruism hurts.
Maturely good people sometimes act altruistically, at personal cost, anyway because they care about strangers.
Clearly not everyone is “maturely good”.
That’s why we don’t select political leaders at random, if we are wise.
Now you might argue that AI is no big deal, and you might say that getting it wrong could never “kill literally everyone”.
Also it is easy to imagine how a lot of normally venal corporate people (who thought they could get money by lying and saying “AI might kill literally everyone” when they don’t believe it to people who do claim to believe it) if a huge paycheck will be given to them for their moderately skilled work contingent on them saying that...
...but if the stakes are really that big then NOT acting like someone who really DID believe that “AI might kill literally everyone” is much much worse than a lady on the side of the road looking helplessly at her broken car. That’s just one lady! The stakes there are much smaller!
The big things are MORE important to get right. Not LESS important.
To get the “win condition for everyone” would justify taking larger risks and costs than just parking by the side of the road and being late for where-ever you planned on going when you set out on the journey.
Maybe a person could say: “I don’t believe that AI could kill literally everyone, I just think that creating it is just an opportunity to make a lot of money and secure power, and use that to survive the near term liquidation of the proletariate when rambunctious human wage slaves are replaced by properly mind-controlled AI slaves”.
Or you could say something like “I don’t believe that AI is even that big a deal. This is just hype, and the stock valuations are gonna be really big but then they’ll crash and I urgently want to sell into the hype to greater fools because I like money and I don’t mind selling stuff I don’t believe in to other people.”
Whatever. Saying whatever you actually think is one of three legs in a the best definition of integrity that I currently know of.
(The full three criteria: non-impulsiveness, fairness, honesty.)
OpenAI was founded as a non-profit in 2015 with the core mission of ensuring that artificial general intelligence benefits all of humanity… Mr. Altman’s departure follows a deliberative review process by the board, which concluded that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities.
You also informed the leadership team that allowing the company to be destroyed “would be consistent with the mission.”
The board could just be right about this.
It is an object level question about a fuzzy future conditional event, that ramifies through a lot of choices that a lot of people will make in a lot of different institutional contexts.
If Open AI’s continued existence ensures that artificial intelligence benefits all of humanity then its continued existence would be consistent with the mission.
If not, not.
What is the real fact of the matter here?
Its hard to say, because it is about the future, but one way to figure out what a group will pursue is to look at what they are proud of, and what they SAY they will pursue.
Look at how the people fleeing into Microsoft argue in defense of themselves:
We, the employees of OpenAl, have developed the best models and pushed the field to new frontiers. Our work on Al safety and governance shapes global norms. The products we built are used by millions of people around the world. Until now, the company we work for and cherish has never been in a stronger position.
This is all MERE IMPACT. This is just the coolaid that startup founders want all their employees to pretend to believe is the most important thing, because they want employees who work hard for low pay.
This is all just “stuff you’d put in your promo packet to get promoted at a FAANG in the mid teens when they were hiring like crazy, even if it was only 80% true, that ‘everyone around here’ agrees with (because everyone on your team is ALSO going for promo)”.
Their statement didn’t mention “humanity” even once.
Their statement didn’t mention “ensuring” that “benefits” go to “all of humanity” even once.
Microsoft’s management has made no similar promise about benefiting humanity in the formal text of its founding, and gives every indication of having no particular scruples or principles or goals larger than a stock price and maybe some executive bonuses or stock buy-back deals.
As is valid in a capitalist republic! That kind of culture, and that kind of behavior, does have a place in it for private companies that manufacture and sell private good to individuals who can freely choose to buy those products.
You don’t have to be very ethical to make and sell hammers or bananas or toys for children.
However, it is baked into the structure of Microsoft’s legal contracts and culture that it will never purposefully make a public good that it knowingly loses a lot of money on SIMPLY because “the benefits to everyone else (even if Microsoft can’t charge for them) are much much larger”.
Open AI has a clear telos and Microsoft has a clear telos as well.
I admire the former more than the latter, especially for something as important as possibly creating a Demon Lord, or a Digital Leviathan, or “a replacement for nearly all human labor performed via arm’s length transactional relations”, or whatever you want to call it.
There are few situations in normal everyday life where the plausible impacts are not just economic, and not just political, not EVEN “just” evolutionary!
This is one of them. Most complex structures in the solar system right now were created, ultimately, by evolution. After AGI, most complex structures will probably be created by algorithms.
“People” are part of the world. “Things you care about” are part of the world.
There is no special carveout for cute babies, or picnics, or choirs, or waltzing with friends, or 20th wedding anniversaries, or taking ecstasy at a rave, or ANYTHING HUMAN.
All of those things are in the world, and unless something prevents that natural course of normal events from doing so: software will eventually eat them too.
I don’t see Microsoft and the people fleeing to Microsoft, taking that seriously, with serious language, that endorses coherent moral ideals in ways that can be directly related to the structural features of institutional arrangements to cause good outcomes for humanity on purpose.
Maybe there is a deeper wisdom there?
Maybe they are secretly saying petty things, even as they secretly plan to do something really importantly good for all of humanity?
Most humans are quite venal and foolish, and highly skilled impression management is a skill that politicians and leaders would be silly to ignore.
But it seems reasonable to me to take both sides at their word.
One side talks and walks like a group that is self-sacrificingly willing to do what it takes to ensure that artificial general intelligence benefits all of humanity and the other side is just straightforwardly not.
Maybe I’m missing some context, but wouldn’t it be better for Open AI as an organized entity to be destroyed than for it to exist right up to the point where all humans are destroyed by an AGI that is neither benevolent nor “aligned with humanity” (if we are somehow so objectively bad as to deserve care by a benevolent powerful and very smart entity).
The problem I suspect is that people just can’t get out of the typical “FOR THE SHAREHOLDERS” mindset, so a company that is literally willing to commit suicide rather than getting hijacked for purposes antithetic to its mission, like a cell dying by apoptosis rather than going cancerous, can be a very good thing, and if only there was more of this. You can’t beat Moloch if you’re not willing to precommit to this sort of action. And let’s face it, no one involved here is facing homelessness and soup kitchens even if Open AI crashes tomorrow. They’ll be a little worse off for a while, their careers will take a hit, and then they’ll pick themselves up. If this was about the safety of humanity it would be a no-brainer that you should be ready to sacrifice that much.
I feel like, not unlike the situation with SBF and FTX, the delusion that OpenAI could possibly avoid this trap maps on the same cognitive weak spot among EA/rationalists of “just let me slip on the Ring of Power this once bro, I swear it’s just for a little while bro, I’ll take it off before Moloch turns me into his Nazgul, trust me bro, just this once”.
This is honestly entirely unsurprising. Rivers flow downhill and companies part of a capitalist economy producing stuff with tremendous potential economic value converge on making a profit.
The corporate structure of OpenAI was set up as an answer to concerns (about AGI and control over AGIs) which were raised by rationalists. But I don’t think rationalists believed that this structure was a sufficient solution to the problem, anymore than non-rationalists believed it. The rationalists that I have been speaking to were generally mostly sceptical about OpenAI.
Oh, I mean, sure, scepticism about OpenAI was already widespread, no question. But in general it seems to me like there’s been too many attempts to be too clever by half from people at least adjacent in ways of thinking to rationalism/EA (like Elon) that go “I want to avoid X-risk but also develop aligned friendly AGI for myself” and the result is almost invariably that it just advances capabilities more than safety. I just think sometimes there’s a tendency to underestimate the pull of incentives and how you often can’t just have your cake and eat it. I remain convinced that if one wants to avoid X-risk from AGI the safest road is probably to just strongly advocate for not building AGI, and putting it in the same bin as “human cloning” as a fundamentally unethical technology. It’s not a great shot, but it’s probably the best one at stopping it. Being wishy-washy doesn’t pay off.
I think you’re in the majority in this opinion around here. I am noticing I’m confused about the lack of enthusiasm for developing alignment methods for thetypes of AGI that are being developed. Trying to get people to stop building it would be ideal, but I don’t see a path to it. The actual difficulty of alignment seems mostly unknown, so potentially vastly more tractable. Yet such efforts make up a tiny part of x-risk discussion.
This isn’t an argument for building ago, but for aligning the specific AGI others build.
Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.
In the cancer case, the human body has every cell begin aligned with the body. Anthropically this has to function until breeding age plus enough offspring to beat losses.
And yes, if faulty cells self destruct instead of continuing this is good, there are cancer treatments that try to gene edit in clean copies of specific genes (p51 as I recall) that mediate this (works in rats...).
However the corporate world/international competition world has many more actors and they are adversarial. OAI self destructing leaves the world’s best AI researchers unemployed, removes them from competing in the next round of model improvements—whoever makes a gpt-5 at a competitor will have the best model outright.
Coordination is hard. Consider the consequences if an entire town decided to stop consuming fossil fuels. They pay the extra costs and rebuild the town to be less car dependent.
However the consequence is this lowers the market price of fossil fuels. So others use more. (Demand elasticity makes the effect still slightly positive)
I mean, yes, a company self-destructing doesn’t stop much if their knowledge isn’t also actively deleted—and even then, it’s just a setback of a few months. But also, by going “oh well we need to work inside the system to fix it somehow” at some point all you get is just another company racing with all others (and in this case, effectively being a pace setter). However you put it, OpenAI is more responsible than any other company for how close we may be to AGI right now, and despite their stated mission, I suspect they did not advance safety nearly as much as capability. So in the end, from the X-risk viewpoint, they mostly made things worse.
I wrote a LOT of words in response to this, talking about personal professional experiences that are not something I coherently understand myself as having a duty (or timeless permission?) to share, so I have reduced my response to something shorter and more general. (Applying my own logic to my own words, in realtime!)
There are many cases (arguably stupid cases or counter-producive cases, but cases) that come up more and more when deals and laws and contracts become highly entangling.
Its illegal to “simply” ask people for money in exchange for giving them a transferable right future dividends on a project for how to make money, that you seal with a handshake. The SEC commands silence sometimes and will put you in a cage if you don’t.
You get elected to local office and suddenly the Brown Act (which I’d repeal as part of my reboot of the Californian Constitution had I the power) forbids you from talking with your co-workers (other elected officials) about work (the city government) at a party.
A Confessor is forbidden kinds of information leak.
That there is so much silence associated with unsavory actorsis a valid and concerning contrast, but if you look into it, you’ll probably find that every single OpenAI employee has an NDA already.
OpenAI’s “business arm”, locking its employees down with NDAs, is already defecting on the “let all the info come out” game.
If the legal system will continue to often be a pay-to-win game and full of fucked up compromises with evil, then silences will probably continue to be common, both (1) among the machiavellians and (2) among the cowards, and (3) among the people who were willing to promise reasonable silences as part of hanging around nearby doing harms reduction. (This last is what I was doing as a “professional ethicist”.)
And IT IS REALLY SCARY to try to stand up for what you think you know is true about what you think is right when lots of people (who have a profit motive for believing otherwise) loudly insist otherwise.
People used to talk a lot about how someone would “go mad” and when I was younger it always made me slightly confused, why “crazy” and “angry” were conflated. Now it makes a lot of sense to me.
I’ve seen a lot of selfish people call good people “stupid” and once the non-selfish person realizes just how venal and selfish and blind the person calling them stupid is, it isn’t hard to call that person “evil” and then you get a classic “evil vs stupid” (or “selfish vs altruistic”) fight. As they fight they become more “mindblind” to each other? Or something? (I’m working on an essay on this, but it might not be ready for a week or a month or a decade. Its a really knotty subject on several levels.)
Good people know they are sometimes fallible, and often use peer validation to check their observations, or check their proofs, or check their emotional calibration, and when those “validation services” get withdrawn for (hidden?) venal reasons, it can be emotionally and mentally disorienting.
(And of course in issues like this one a lot of people are automatically going to have a profit motive when a decision arises about whether to build a public good or not. By definition: the maker of a public good can’t easily charge money for such a thing. (If they COULD charge money for it then it’d be a private good or maybe a club good.))
The Board of OpenAI might be personally sued by a bunch of Machiavellian billionaires, or their allies, and if that happens, everything the board was recorded as saying will be gone over with a fine-toothed comb, looking for tiny little errors.
Every potential quibble is potentially more lawyer time. Every bit of lawyer time is a cost that functions as a financial reason to settle instead of keep fighting for what is right. Making your attack surface larger is much easier than making an existing attack surface smaller.
If the board doesn’t already have insurance for that extenuating circumstance, then I commit hereby to donate at least $100 to their legal defense fund, if they start one, which I hope they never need to do.
And in the meantime, I don’t think they owe me much of anything, except for doing their damned best to ensure that artificial general intelligence benefits all humanity.
Maybe I’m missing some context, but wouldn’t it be better for Open AI as an organized entity to be destroyed than for it to exist right up to the point where all humans are destroyed by an AGI that is neither benevolent nor “aligned with humanity” (if we are somehow so objectively bad as to not deserve care by a benevolent powerful and very smart entity).
This seems to presuppose that there is a strong causal effect from OpenAI’s destruction to avoiding creation of an omnicidal AGI, which doesn’t seem likely? The real question is whether OpenAI was, on the margin, a worse front-runner than its closest competitors, which is plausible, but then the board should have made that case loudly and clearly, because, entirely predictably, their silence has just made the situation worse.
When I read this part of the letter, the authors seem to be throwing it in the face of the board like it is a damning accusation, but actually, as I read it, it seems very prudent and speaks well for the board.
Maybe I’m missing some context, but wouldn’t it be better for Open AI as an organized entity to be destroyed than for it to exist right up to the point where all humans are destroyed by an AGI that is neither benevolent nor “aligned with humanity” (if we are somehow so objectively bad as to not deserve care by a benevolent powerful and very smart entity).
This reminds me a lot of a blockchain project I served as an ethicist, which was initially a “project” that was interested in advancing a “movement” and ended up with a bunch of people whose only real goal was to cash big paychecks for a long time (at which point I handled my residual duties to the best of my ability and resigned, with lots of people expressing extreme confusion and asking why I was acting “foolishly” or “incompetently” (except for a tiny number who got angry at me for not causing a BIGGER explosion than just leaving to let a normally venal company be normally venal without me)).
In my case, I had very little formal power. I bitterly regretted not having insisted “as the ethicist” in having a right to be informed of any board meeting >=36 hours in advance, and to attend every one of them, and to have the right to speak at them.
(Maybe it is a continuing flaw of “not thinking I need POWER”, to say that I retrospectively should have had a vote on the Board? But I still don’t actually think I needed a vote. Most of my job was to keep saying things like “lying is bad” or “stealing is wrong” or “fairness is hard to calculate but bad to violate if clear violations of it are occurring” or “we shouldn’t proactively serve states that run gulags, we should prepare defenses, such that they respect us enough to explicitly request compliance first”. You know, the obvious stuff, that people only flinch from endorsing because a small part of each one of us, as a human, is a very narrowly selfish coward by default, and it is normal for us, as humans, to need reminders of context sometimes when we get so much tunnel vision during dramatic moments that we might commit regrettable evils through mere negligence.)
No one ever said that it is narrowly selfishly fun or profitable to be in Gethsemane and say “yes to experiencing pain if the other side who I care about doesn’t also press the ‘cooperate’ button”.
But to have “you said that ending up on the cross was consistent with being a moral leader of a moral organization!” flung on one’s face as an accusation suggests to me that the people making the accusation don’t actually understand that sometimes objective de re altruism hurts.
Maturely good people sometimes act altruistically, at personal cost, anyway because they care about strangers.
Clearly not everyone is “maturely good”.
That’s why we don’t select political leaders at random, if we are wise.
Now you might argue that AI is no big deal, and you might say that getting it wrong could never “kill literally everyone”.
Also it is easy to imagine how a lot of normally venal corporate people (who thought they could get money by lying and saying “AI might kill literally everyone” when they don’t believe it to people who do claim to believe it) if a huge paycheck will be given to them for their moderately skilled work contingent on them saying that...
...but if the stakes are really that big then NOT acting like someone who really DID believe that “AI might kill literally everyone” is much much worse than a lady on the side of the road looking helplessly at her broken car. That’s just one lady! The stakes there are much smaller!
The big things are MORE important to get right. Not LESS important.
To get the “win condition for everyone” would justify taking larger risks and costs than just parking by the side of the road and being late for where-ever you planned on going when you set out on the journey.
Maybe a person could say: “I don’t believe that AI could kill literally everyone, I just think that creating it is just an opportunity to make a lot of money and secure power, and use that to survive the near term liquidation of the proletariate when rambunctious human wage slaves are replaced by properly mind-controlled AI slaves”.
Or you could say something like “I don’t believe that AI is even that big a deal. This is just hype, and the stock valuations are gonna be really big but then they’ll crash and I urgently want to sell into the hype to greater fools because I like money and I don’t mind selling stuff I don’t believe in to other people.”
Whatever. Saying whatever you actually think is one of three legs in a the best definition of integrity that I currently know of.
(The full three criteria: non-impulsiveness, fairness, honesty.)
(Sauce. Italics and bold not in original.)
Compare this again:
The board could just be right about this.
It is an object level question about a fuzzy future conditional event, that ramifies through a lot of choices that a lot of people will make in a lot of different institutional contexts.
If Open AI’s continued existence ensures that artificial intelligence benefits all of humanity then its continued existence would be consistent with the mission.
If not, not.
What is the real fact of the matter here?
Its hard to say, because it is about the future, but one way to figure out what a group will pursue is to look at what they are proud of, and what they SAY they will pursue.
Look at how the people fleeing into Microsoft argue in defense of themselves:
This is all MERE IMPACT. This is just the coolaid that startup founders want all their employees to pretend to believe is the most important thing, because they want employees who work hard for low pay.
This is all just “stuff you’d put in your promo packet to get promoted at a FAANG in the mid teens when they were hiring like crazy, even if it was only 80% true, that ‘everyone around here’ agrees with (because everyone on your team is ALSO going for promo)”.
Their statement didn’t mention “humanity” even once.
Their statement didn’t mention “ensuring” that “benefits” go to “all of humanity” even once.
Microsoft’s management has made no similar promise about benefiting humanity in the formal text of its founding, and gives every indication of having no particular scruples or principles or goals larger than a stock price and maybe some executive bonuses or stock buy-back deals.
As is valid in a capitalist republic! That kind of culture, and that kind of behavior, does have a place in it for private companies that manufacture and sell private good to individuals who can freely choose to buy those products.
You don’t have to be very ethical to make and sell hammers or bananas or toys for children.
However, it is baked into the structure of Microsoft’s legal contracts and culture that it will never purposefully make a public good that it knowingly loses a lot of money on SIMPLY because “the benefits to everyone else (even if Microsoft can’t charge for them) are much much larger”.
Open AI has a clear telos and Microsoft has a clear telos as well.
I admire the former more than the latter, especially for something as important as possibly creating a Demon Lord, or a Digital Leviathan, or “a replacement for nearly all human labor performed via arm’s length transactional relations”, or whatever you want to call it.
There are few situations in normal everyday life where the plausible impacts are not just economic, and not just political, not EVEN “just” evolutionary!
This is one of them. Most complex structures in the solar system right now were created, ultimately, by evolution. After AGI, most complex structures will probably be created by algorithms.
Evolution itself is potentially being overturned.
Software is eating the world.
“People” are part of the world. “Things you care about” are part of the world.
There is no special carveout for cute babies, or picnics, or choirs, or waltzing with friends, or 20th wedding anniversaries, or taking ecstasy at a rave, or ANYTHING HUMAN.
All of those things are in the world, and unless something prevents that natural course of normal events from doing so: software will eventually eat them too.
I don’t see Microsoft and the people fleeing to Microsoft, taking that seriously, with serious language, that endorses coherent moral ideals in ways that can be directly related to the structural features of institutional arrangements to cause good outcomes for humanity on purpose.
Maybe there is a deeper wisdom there?
Maybe they are secretly saying petty things, even as they secretly plan to do something really importantly good for all of humanity?
Most humans are quite venal and foolish, and highly skilled impression management is a skill that politicians and leaders would be silly to ignore.
But it seems reasonable to me to take both sides at their word.
One side talks and walks like a group that is self-sacrificingly willing to do what it takes to ensure that artificial general intelligence benefits all of humanity and the other side is just straightforwardly not.
The problem I suspect is that people just can’t get out of the typical “FOR THE SHAREHOLDERS” mindset, so a company that is literally willing to commit suicide rather than getting hijacked for purposes antithetic to its mission, like a cell dying by apoptosis rather than going cancerous, can be a very good thing, and if only there was more of this. You can’t beat Moloch if you’re not willing to precommit to this sort of action. And let’s face it, no one involved here is facing homelessness and soup kitchens even if Open AI crashes tomorrow. They’ll be a little worse off for a while, their careers will take a hit, and then they’ll pick themselves up. If this was about the safety of humanity it would be a no-brainer that you should be ready to sacrifice that much.
Sam’s latest tweet suggests he can’t get out of the “FOR THE SHAREHOLDERS” mindset.
“satya and my top priority remains to ensure openai continues to thrive
we are committed to fully providing continuity of operations to our partners and customers”
This does sound antithetical to the charter and might be grounds to replace Sam as CEO.
I feel like, not unlike the situation with SBF and FTX, the delusion that OpenAI could possibly avoid this trap maps on the same cognitive weak spot among EA/rationalists of “just let me slip on the Ring of Power this once bro, I swear it’s just for a little while bro, I’ll take it off before Moloch turns me into his Nazgul, trust me bro, just this once”.
This is honestly entirely unsurprising. Rivers flow downhill and companies part of a capitalist economy producing stuff with tremendous potential economic value converge on making a profit.
The corporate structure of OpenAI was set up as an answer to concerns (about AGI and control over AGIs) which were raised by rationalists. But I don’t think rationalists believed that this structure was a sufficient solution to the problem, anymore than non-rationalists believed it. The rationalists that I have been speaking to were generally mostly sceptical about OpenAI.
Oh, I mean, sure, scepticism about OpenAI was already widespread, no question. But in general it seems to me like there’s been too many attempts to be too clever by half from people at least adjacent in ways of thinking to rationalism/EA (like Elon) that go “I want to avoid X-risk but also develop aligned friendly AGI for myself” and the result is almost invariably that it just advances capabilities more than safety. I just think sometimes there’s a tendency to underestimate the pull of incentives and how you often can’t just have your cake and eat it. I remain convinced that if one wants to avoid X-risk from AGI the safest road is probably to just strongly advocate for not building AGI, and putting it in the same bin as “human cloning” as a fundamentally unethical technology. It’s not a great shot, but it’s probably the best one at stopping it. Being wishy-washy doesn’t pay off.
I think you’re in the majority in this opinion around here. I am noticing I’m confused about the lack of enthusiasm for developing alignment methods for thetypes of AGI that are being developed. Trying to get people to stop building it would be ideal, but I don’t see a path to it. The actual difficulty of alignment seems mostly unknown, so potentially vastly more tractable. Yet such efforts make up a tiny part of x-risk discussion.
This isn’t an argument for building ago, but for aligning the specific AGI others build.
Personally I am fascinated by the problems of interpretability and I would consider “no more GPTs for you guys until you figure out at least the main functioning principles of GPT-3” a healthy exercise in actual ML science to pursue, but I also have to acknowledge that such an understanding would make distillation far more powerful and thus also lead to a corresponding advance in capabilities. I am honestly stumped at what “I want to do something” looks like that doesn’t somehow end up backfiring. It maybe that the problem is just thinking this way in the first place, and this really is just a shudder political problem, and tech/science can only make it worse.
That all makes sense.
Except that this is exactly what I’m puzzled by: a focus on solutions that probably won’t work (“no more GPTs for you guys” is approximately impossible), instead of solutions that still might—working on alignment, and trading off advances in alignment for advances in AGI.
It’s like the field has largely given up on alignment, and we’re just trying to survive a few more months by making sure to not contribute to AGI at all.
But that makes no sense. MIRI gave up on aligning a certain type of AGI for good reasons. But nobody has seriously analyzed prospects for aligning the types of AGI we’re likely to get: language model agents or loosely brainlike collections of deep nets. When I and a few others write about plans for aligning those types of AGI, we’re largely ignored. The only substantive comments are “well there are still ways those plans could fail”, but not arguments that they’re actually likely to fail. Meanwhile, everyone is saying we have no viable plans for alignment, and acting like that means it’s impossible. I’m just baffled by what’s going on in the collective unspoken beliefs of this field.
I’ll be real, I don’t know what everyone else thinks, but personally I can say I wouldn’t feel comfortable contributing to anything AGI-related at this point because I have very low trust even aligned AGI would result in a net good for humanity, with this kind of governance. I can imagine maybe amidst all the bargains with the Devil there is one that will genuinely pay off and is the lesser evil, but can’t tell which one. I think the wise thing to do would be just not to build AGI at all, but that’s not a realistically open path. So yeah, my current position is that literally any action I could take advances the kind of future I would want by an amount that is at best below the error margin of my guesses, and at worst negative. It’s not a super nice spot to be in but it’s where I’m at and I can’t really lie to myself about it.
In the cancer case, the human body has every cell begin aligned with the body. Anthropically this has to function until breeding age plus enough offspring to beat losses.
And yes, if faulty cells self destruct instead of continuing this is good, there are cancer treatments that try to gene edit in clean copies of specific genes (p51 as I recall) that mediate this (works in rats...).
However the corporate world/international competition world has many more actors and they are adversarial. OAI self destructing leaves the world’s best AI researchers unemployed, removes them from competing in the next round of model improvements—whoever makes a gpt-5 at a competitor will have the best model outright.
Coordination is hard. Consider the consequences if an entire town decided to stop consuming fossil fuels. They pay the extra costs and rebuild the town to be less car dependent.
However the consequence is this lowers the market price of fossil fuels. So others use more. (Demand elasticity makes the effect still slightly positive)
I mean, yes, a company self-destructing doesn’t stop much if their knowledge isn’t also actively deleted—and even then, it’s just a setback of a few months. But also, by going “oh well we need to work inside the system to fix it somehow” at some point all you get is just another company racing with all others (and in this case, effectively being a pace setter). However you put it, OpenAI is more responsible than any other company for how close we may be to AGI right now, and despite their stated mission, I suspect they did not advance safety nearly as much as capability. So in the end, from the X-risk viewpoint, they mostly made things worse.
I agree with all of this in principal, but I am hung up on the fact that it is so opaque. Up until now the board have determinedly remained opaque.
If corporate seppuku is on the table, why not be transparent? How does being opaque serve the mission?
I wrote a LOT of words in response to this, talking about personal professional experiences that are not something I coherently understand myself as having a duty (or timeless permission?) to share, so I have reduced my response to something shorter and more general. (Applying my own logic to my own words, in realtime!)
There are many cases (arguably stupid cases or counter-producive cases, but cases) that come up more and more when deals and laws and contracts become highly entangling.
Its illegal to “simply” ask people for money in exchange for giving them a transferable right future dividends on a project for how to make money, that you seal with a handshake. The SEC commands silence sometimes and will put you in a cage if you don’t.
You get elected to local office and suddenly the Brown Act (which I’d repeal as part of my reboot of the Californian Constitution had I the power) forbids you from talking with your co-workers (other elected officials) about work (the city government) at a party.
A Confessor is forbidden kinds of information leak.
Fixing <all of this (gesturing at nearly all of human civilization)> isn’t something that we have the time or power to do before we’d need to USE the “fixed world” to handle AGI sanely or reasonably, because AGI is coming so fast, and the world is so broken.
That there is so much silence associated with unsavory actors is a valid and concerning contrast, but if you look into it, you’ll probably find that every single OpenAI employee has an NDA already.
OpenAI’s “business arm”, locking its employees down with NDAs, is already defecting on the “let all the info come out” game.
If the legal system will continue to often be a pay-to-win game and full of fucked up compromises with evil, then silences will probably continue to be common, both (1) among the machiavellians and (2) among the cowards, and (3) among the people who were willing to promise reasonable silences as part of hanging around nearby doing harms reduction. (This last is what I was doing as a “professional ethicist”.)
And IT IS REALLY SCARY to try to stand up for what you think you know is true about what you think is right when lots of people (who have a profit motive for believing otherwise) loudly insist otherwise.
People used to talk a lot about how someone would “go mad” and when I was younger it always made me slightly confused, why “crazy” and “angry” were conflated. Now it makes a lot of sense to me.
I’ve seen a lot of selfish people call good people “stupid” and once the non-selfish person realizes just how venal and selfish and blind the person calling them stupid is, it isn’t hard to call that person “evil” and then you get a classic “evil vs stupid” (or “selfish vs altruistic”) fight. As they fight they become more “mindblind” to each other? Or something? (I’m working on an essay on this, but it might not be ready for a week or a month or a decade. Its a really knotty subject on several levels.)
Good people know they are sometimes fallible, and often use peer validation to check their observations, or check their proofs, or check their emotional calibration, and when those “validation services” get withdrawn for (hidden?) venal reasons, it can be emotionally and mentally disorienting.
(And of course in issues like this one a lot of people are automatically going to have a profit motive when a decision arises about whether to build a public good or not. By definition: the maker of a public good can’t easily charge money for such a thing. (If they COULD charge money for it then it’d be a private good or maybe a club good.))
The Board of OpenAI might be personally sued by a bunch of Machiavellian billionaires, or their allies, and if that happens, everything the board was recorded as saying will be gone over with a fine-toothed comb, looking for tiny little errors.
Every potential quibble is potentially more lawyer time. Every bit of lawyer time is a cost that functions as a financial reason to settle instead of keep fighting for what is right. Making your attack surface larger is much easier than making an existing attack surface smaller.
If the board doesn’t already have insurance for that extenuating circumstance, then I commit hereby to donate at least $100 to their legal defense fund, if they start one, which I hope they never need to do.
And in the meantime, I don’t think they owe me much of anything, except for doing their damned best to ensure that artificial general intelligence benefits all humanity.
This seems to presuppose that there is a strong causal effect from OpenAI’s destruction to avoiding creation of an omnicidal AGI, which doesn’t seem likely? The real question is whether OpenAI was, on the margin, a worse front-runner than its closest competitors, which is plausible, but then the board should have made that case loudly and clearly, because, entirely predictably, their silence has just made the situation worse.