Some quotes & few personal opinions:
FT reports
Musk is also in discussions with a number of investors in SpaceX and Tesla about putting money into his new venture, said a person with direct knowledge of the talks. “A bunch of people are investing in it . . . it’s real and they are excited about it,” the person said.
...
Musk recently changed the name of Twitter to X Corp in company filings, as part of his plans to create an “everything app” under the brand “X”. For the new project, Musk has secured thousands of high-powered GPU processors from Nvidia, said people with knowledge of the move.
…
During a Twitter Spaces interview this week, Musk was asked about a Business Insider report that Twitter had bought as many as 10,000 Nvidia GPUs, “It seems like everyone and their dog is buying GPUs at this point,” Musk said. “Twitter and Tesla are certainly buying GPUs.” People familiar with Musk’s thinking say his new AI venture is separate from his other companies, though it could use Twitter content as data to train its language model and tap Tesla for computing resources.
According to xAI website, the initial team is composed of
Elon Musk
and they are “advised by Dan Hendrycks, who currently serves as the director of the Center for AI Safety.”
According to reports xAI will seek to create a “maximally curious” AI, and this also seems to be the main new idea how to solve safety, with Musk explaining: “If it tried to understand the true nature of the universe, that’s actually the best thing that I can come up with from an AI safety standpoint,” … “I think it is going to be pro-humanity from the standpoint that humanity is just much more interesting than not-humanity.”
My personal comments:
Sorry, but at face value, this just does not seem a great plan from safety perspective. Similarly to Elon Musk’s previous big bet how to make us safe by making AI open-source and widely distributed (“giving everyone access to new ideas”).
Sorry, but given “Center for AI Safety” moves to put them into some sort of “Center”, public representative position of AI Safety—including the name choice, and organizing the widely reported Statement on AI risk—it seems publicly associating their brand with xAI is a strange choice.
Humanity maybe is more “interesting” than nothing, but is it more “interesting” than anything that isn’t humanity and can be assembled from the same matter? Definitely not!
That’s true. But there is no shortage of matter, the whole Jupiter is available nearby. Creative ideas of what to do with those atoms are likely to be more of a bottleneck.
However, one can easily see an AI deciding that billions of humans are boring and that a few millions might be closer to an optimal number from the curiosity viewpoint. So we still have a problem, to say the least...
Musk is strangely insightful about some things (electric cars becoming mainstream, reusable rockets being economically feasible), and strangely thoughtless about other things. If I’ve learned anything from digesting a few hundred blogs and podcasts on AI over the past 7 years, it’s that there is no single simple objective that captures what we actually want AI to do.
Curiosity is not going to cut it. Nor is “freedom”, which is what Musk was talking with Stuart Russel about maximizing a year or two ago. Human values are messy and context-dependent and not even internally consistent. If we actually want AI that fulfills the desires that humans would want after long reflection on the outcomes, it’s going to involve a lot of learning from human behavior, likely with design mistakes along the way which we will have to correct.
The hard part is not finding a single thing to maximize. The hard part is training an agent that is corrigible so that when we inevitably mess up, we don’t just die or enter a permanent dystopia on the first attempt.
On the Twitter spaces 2 days ago, a lot of emphasis seemed put on understanding which to me has a more humble conotation to me.
Still I agree I would not bet on their luck with a choice of a single value to build their systems upon.( Although they have a luckers track record.)
This is not obvious, to say the least. In fact, I think this is unlikely. Non-humanity might very well be more interesting to non-human minds. E.g., if there is a maximally curious AI, it might be much more interested in tiling the surface of Earth with its conspecifics and marvelling at the complexity and interestingness of the emerging society of AIs, compared to which the humanity may appear to be utterly boring.
Is Musk just way less intelligent than I thought? He still seems to have no clue at all about the actual safety problem. Anyone thinking clearly should figure out that this is a horrible idea within at most 5 minutes of thinking.
Obviously pure curiosity is a horrible objective to give to a superAI. “Curiosity” as currently defined in the RL literature is really something more like “novelty-seeking”, and in the limit this will cause the AI to keep rearranging the universe into configurations it hasn’t seen before, as fast as it possibly can…
I think last sentence kinda misses the point, but in general I agree. Why all this downvotes?
Because the comment assumes that all these brilliant people on the new team would interpret “novelty-seeking” in a very straightforward (and, actually, quite boring) way: “keep rearranging the universe into configurations it hasn’t seen before, as fast as it possibly can”.
If any of us can rearrange things as fast as one possible can, that person would get bored within hours (if not minutes).
The people doing that project will ponder what makes life interesting and will try to formalize that… This is a very strong team (judging from the listing of names in the post), they will figure out something creative.
That being said, safety challenges in that approach are formidable. The most curious thing one can do is probably to self-modify in various interesting ways and see how it feels (not as fast as possible, and not quite in arbitrary ways, but still to explore plenty of variety). So one would need to explicitly address all safety issues associated with open-ended recursive self-modification. It’s not easy at all...
The word “curiosity” has a fairly well-defined meaning in the Reinforcement Learning literature (see for instance this paper). There are vast numbers of papers that try to come up with ways to give an agent intrinsic rewards that map onto the human understanding of “curiosity”, and almost all of them are some form of “go towards states you haven’t seen before”. The predictable consequence of prioritising states you haven’t seen before is that you will want to change the state of the universe very very quickly.
Novelty is important. Going towards states you have not seen before is important. This will be a part of the new system, that’s for sure.
But this team is under no obligation to follow whatever current consensus might be (if there is a consensus). Whatever is the state of the field, it can’t claim a monopoly on how words “curiosity” or “novelty” are interpreted, what are the good ways to maximize them… How one constrains going through a subset of all those novel states by aesthetics, by the need to take time and enjoy (“exploit”) those new states, and by safety considerations (so, by predicting whether the novel state will be useful and not detrimental)… All this will be on the table...
Some of the people on this team are known for making radical breakthroughs in machine learning and for founding new subfields in machine learning. They are not going to blindly copy the approaches from the existing literature (although they will take existing literature into account).
Not too sure about the downvotes either, but I’m curious how the last sentence misses the point? Are you aware of a formal definition of “interesting” or “curiosity” that isn’t based on novelty-seeking?
I think for all definitions of “curiosity” that make sense (that aren’t like “we just use this word to refer to something completely unrelated to what people usually understand by it”) maximally curious AI kills us, so it doesn’t matter how curiosity is defined in RL literature.
It’s definitely true that adding another AGI capabilities org increases the rate of AI capabilities research, and complicates all future negotiations by adding in another party, which dramatically increases the number of 2-org dyads where each dyad is a potential point of failure in any negotiation.
But, at the same time, it also adds in redundancy in the event that multiple AGI orgs are destroyed or rendered defunct by some other means. If Anthropic, Deepmind, and OpenAI are bumped off, but Facebook AI labs and some other AI lab remain on top, then that would be a truly catastrophic situation for humanity.
In models where a rapidly advancing world results in rapid changes and upheavals, X.AI’s existence as a top AI lab (that is simultaneously safety-conscious) adds in a form of redundancy that is absolutely crucial in various scenarios, e.g. where a good outcome could still be attained so long as there is at least one surviving safety-conscious lab running in parallel with various AI safety efforts in the Berkeley area.
What makes you count x.AI as safety-conscious?
One datapoint: they could have gone with a public benefit corporation, but chose not to. And none of the staff, particularly key figures like Szegedy, are, AFAIK, at all interested in safety. (Szegedy in particular has been dismissive of there being anything but the most minor near-term AI-bias-style issues on Twitter, IIRC.)
EDIT: also relevant: Musk was apparently recruiting the first dozen for x.AI by promising the researchers “$200 million” of equity each, under the reasoning that x.AI is (somehow) already worth “$20,000 million” and thus 1% equity each is worth that much.
They are advised by Dan Hendrycks. That counts for something.
Yes, the entire redundancy argument hinges on the state of the situation with Hendrycks. Depending on Hendrycks ability to reform X.AI’s current stated alignment plan to a sufficient degree, it would just be another Facebook AI labs, which would reduce, not increase, the redundancy.
In particular, if Hendrycks would just be be removed or marginalized in scenarios where safety-conscious labs start dropping like flies (a scenario that Musk, Altman, Hassabis, Lecun, and Amodei are each aware of), then X.AI would not be introducing any redundancy at all in the first place.
Sure, it’s better for them to have that advice then not have that advice. I will refer you to this post for my guess of how much it counts for. [Like, we can see their stated goal of how they’re going to go about safety!]
To me, the name “xAI” throws into relief how terrible a name “OpenAI” is, and all of the damaging associations openness comes with.
So will the maximally curious AI be curious about what would happen if you genetically modified all humans to become unicorns?
If curiosity is driving then it’s more likely that you will genetically modify some into becoming unicorns and others into becoming dinosaurs.
In general, you would expect that AI to do a lot of different things and not keep all humans the same.
Casual observation suggests that human suffering is more interesting than human satisfaction, and I think this bodes ill for the plan.
No good deed goes unpunished. By default there would likely be no advising.
I am not sure what this comment is responding to.
The only criticism of you and your team in the OP is that you named your team the “Center” for AI Safety, as though you had much history leading safety efforts or had a ton of buy-in from the rest of the field. I don’t believe that either of these are true[1], it seems to me that the name preceded you engaging in major safety efforts. This power-grab for being the “Center” of the field was a step toward putting you in a position to be publicly interviewed and on advisory boards like this and coordinate the signing of a key statement[2]. It is a symmetric tool for gaining power, and doesn’t give me any evidence about whether you having this power is a good or bad thing.
There is also a thread of discussion that I would say hinges on the question of “whether you being an advisor is a substantive change to the orgs’ direction or safety-washing”. I don’t really know, and this is why I don’t consider “By default there would likely be no advising” a sufficient argument to show that this is a clearly good deed and not a bad or neutral deed.
For instance, I would be interested to know whether your org name was run by people such as Katja Grace, Paul Christiano, Vika Krakovna, Jan Leike, Scott Garrabrant, Evan Hubinger, or Stuart Armstrong, all people who have historically made a number of substantive contributions to the field. I currently anticipate that you did not get feedback from most of these people on the team name. I would be pleased to have my beliefs falsified on this question.
I think the statement was a straightforward win and good deed and I support it. Other work of yours I have mixed or negative impressions of.
Dan spent his entire PhD working on AI safety and did some of the most influential work on OOD robustness and OOD detection, as well as writing Unsolved Problems. Even if this work is less valued by some readers on LessWrong (imo mistakenly), it seems pretty inaccurate to say that he didn’t work on safety before founding CAIS.
Fwiw, I disagree that “center” carries these connotations. To me it’s more like “place where some activity of a certain kind is carried out”, or even just a synonym of “institute”. (I feel the same about the other 5-10 EA-ish “centers/centres” focused on AI x-risk-reduction.) I guess I view these things more as “a center of X” than “the center of X”. Maybe I’m in the minority on this but I’d be kind of surprised if that were the case.
Interesting. Good point about there being other examples. I’ll list some of them that I can find with a quick search, and share my impressions of whether the names are good/bad/neutral.
UC Berkeley’s Center for Human-Compatible AI
Paul Christiano’s Alignment Research Center
Center for Long-Term Risk
Center for Long-Term Resilience
The first feels pretty okay to me because (a) Stuart Russell has a ton of leadership points to spend in the field of AI, given he wrote the standard textbook (prior to deep learning), and (b) this is within the academic system, where I expect the negotiation norms for naming have been figured out many decades ago.
I thought about the second one at the time. It feels slightly like it’s taking central namespace, but I think Paul Christiano is on a shortlist of people who can reasonably take the mantle as foremost alignment researcher, so I am on net supportive of him having this name (and I expect many others in the field are perfectly fine with it too). I also mostly do not expect Paul to use the name for that much politicking, who in many ways seems to try to do relatively inoffensive things.
I don’t have many associations with the third and fourth, and don’t see them as picking up much political capital from the names. Insofar as they’re putting themselves as representative of the ‘longtermism’ flag, I don’t particularly feel connected to that flag and am not personally interested in policing its use. And otherwise, if I made a non-profit called (for example) “The Center for Mathematical Optimization Methods” I think that possibly one or two people would be annoyed if they didn’t like my work, but mostly I don’t think there’s a substantial professional network or field of people that I’d be implicitly representing, and those who knew about my work would be glad that anyone was making an effort.
I’ll repeat that one of my impressions here is that Dan is picking up a lot of social and political capital that others have earned by picking “Center for AI Safety”, a field he didn’t build, hadn’t led before, and is aggressively spending the political capital of in ways that most of the people who did build the field can’t check and don’t know whether they endorse. (With the exception that I believe the signed statement would be endorsed by most all people in the field and I’m glad it was executed successfully.)
(As a related gripe, Dan and his team also moved to overwrite the namespace that Drexler of the FHI occupied, in a way that seemed uncaring/disrespectful to me.)
I think your analysis makes sense if using a “center” name really should require you to have some amount of eminence or credibility first. I’ve updated a little bit in that direction now, but I still mostly think it’s just synonymous with “institute”, and on that view I don’t care if someone takes a “center” name (any more than if someone takes an “institute” name). It’s just, you know, one of the five or so nouns non-profits and think tanks use in their names (“center”, “institute”, “foundation”, “organization”, “council”, blah).
Or actually, maybe it’s more like I’m less convinced that there’s a common pool of social/political capital that CAIS is now spending from. I think the signed statement has resulted in other AI gov actors now having higher chances of getting things done. I think if the statement had been not very successful, it wouldn’t have harmed those actors’ ability to get things done. (Maybe if it was really botched it would’ve, but then my issue would’ve been with CAIS’s botching the statement, not with their name.)
I guess I also don’t really buy that using “center” spends from this pool (to the extent that there is a pool). What’s the scarce resource it’s using? Policy-makers’ time/attention? Regular people’s time/attention? Or do people only have a fixed amount of respect or credibility to accord various AI safety orgs? I doubt, for example, that other orgs lost out on opportunities to influence people, or inform policy-makers, due to CAIS’s actions. I guess what I’m trying to say is I’m a bit confused about your model!
Btw, in case it matters, the other examples I had in mind were Center for Security and Emerging Technology (CSET) and Centre for the Governance of AI (GovAI).
I’m finding it hard to explain why I think that naming yourself “The Center for AI Safety” is taking ownership of a prime piece of namespace real estate and also positioning yourselves as representing the field moreso than if you call yourself “Conjecture” or “Redwood Research” or “Anthropic”.
Like, consider an outsider trying to figure out who to go to in this field to help out or ask questions. They will be more likely to figure out that “The Center for AI Safety” is a natural place to go to rather than “the Future of Humanity Institute”, even though the second one has done a lot more relevant research.
There are basically no other organizations with “AI Safety” in the name, even though when Dan picked the name the others had all done more work to build up the field (e.g. FLI’s puerto rico conference, FHI’s book on Superintelligence, MIRI’s defining the field a decade ago, more).
[Added: I think names should be accurate on Simulacra Level 1 and not Simulacra Level 3-4. For instance, if lots of small and incompetent organizations call themselves “Center” and “Institute” when they aren’t much of a center of anything and aren’t long-lasting institutions, this is bad for our ability to communicate and it isn’t okay even if a lot of people do it. “Organization” seems like a pretty low bar and means something less substantive. “Council” seems odd here and only counts if you are actually primarily a council to other institutions.]
I think it’s pretty clear that there is. Of course “the AI Safety field” has a shared reputation and social capital. People under the same flag all are taking and investing resources into that flag. Scandals in some parishes of the Catholic Church affect the reputation of other Catholic parishes (and other Christians broadly). The reputation of YC affects the reputation of Silicon Valley and startups more broadly. When I watch great talks by Patrick Collison I more want to work in startups and think highly of them, when I see Sam Bankman-Fried steal billions of dollars I think that in general crypto orgs are more likely to be frauds.
I agree that the signed sentence gave more capital back to others than they have spent. Hendrycks’ team are not in-debt in my book. I nonetheless want to name costs and be open about criticism, and also say I am a bit nervous about what they will do in the relationship with xAI (e.g. it feels a bit like they might give xAI a tacit endorsement from the AI Safety community, which could easily be exceedingly costly).
For one, in an incognito browser, their website is the first google result for the search term “AI Safety”, before even the wikipedia page.
After seeing The Center for AI Policy and The AI Policy Institute launch within a month of one another I am now less concerned about first-movers monopolizing the namespace.
What were the other options? Have you considered advising xAI privately, or re-directing xAI to be advised by someone else? Also, would the default be clearly worse?
As you surely are quite aware of, one of the bigger fights about AI safety across academia, policymaking and public spaces now is the discussion about AI safety being “distraction” from immediate social harms, and being actually the agenda favoured by the leading labs and technologists. (Often comes with accusations of attempted regulatory capture, worries about concentration of power, etc.)
In my view, given this situation, it seems valuable to have AI safety represented also by somewhat neutral coordination institutions without obvious conflicts of interest and large attack surfaces.
As I wrote in the OP, CAIS made some relatively bold moves to became one of the most visible “public representatives” of AI safety—including the name choice, and organizing the widely reported Statement on AI risk (which was a success). Until now, my impression was when you are taking the namespace, you also aim for CAIS to be such “somewhat neutral coordination institution without obvious conflicts of interest and large attack surfaces”.
Maybe I was wrong, and you don’t aim for this coordination/representative role. But if you do, advising xAI seems a strange choice for multiple reasons:
1. it makes you somewhat less neutral party for the broader world; even if the link to xAI does not actually influence your judgement or motivations, I think on priors it’s broadly sensible for policymakers, politicians and public to suspect all kind of activism, advocacy and lobbying efforts having some side-motivations or conflicts of interest, and this strengthens this suspicion
2. the existing public announcements do not inspire confidence in the safety mindset in xAI founders; it seems unclear whether you advised xAI also about the plan “align to curiosity”
3. if xAI turns to be mostly interested in safety-washing, it’s more of problem if it’s aided by more central/representative org
It is unclear to me on whether having your name publicly associated with them is good or bad. (compared to advising without it being publicly announced)
On one hand it boosts awareness of the CAIS and gives you the opportunity to cause them some amount of negative publicity if you at some point distance yourself. On the other it does grant them some license to brush off safety worries by gesturing at your involvement.
I recognize some very impressive names on this list of AI researchers.
“Maximally curious” sounds somewhat similar to open-ended AI of Ken Stanley and Joel Lehman.
Traditional “AI alignment” methods are unlikely to work for these approaches to AI.
But our true goal is not “AI alignment”, it is something like “AI existential safety” + good future.
Still, it is likely that more than just relying on “humanity is just much more interesting than not-humanity for a maximally curious AI” would be needed for AI existential safety in this case.
We need a “maximally curious AI” to be careful about “the fabric of reality” and not to destroy “the fabric of reality” (and itself together with us). We also need a “maximally curious AI” to take “interests of sentient beings” into account in a proper way (in particular, not to torture them in “interesting” ways). One can see how being “maximally curious” can go very wrong in this sense...
So, this approach does require a lot of work by “AI existential safety researchers”, both with respect to X-risk and with respect to S-risk.
An entirely new set of “AI existential safety” methods would be needed (one cannot hope to “control” a “maximally curious” AI nor could one hope to impose a particular set of specific arbitrarily selected goals and values on a “maximally curious” AI, but one might still be able to create a situation when a “maximally curious” AI properly takes certain things into account and properly cares about certain things).
Right, so the most curious thing one can do is to self-modify in interesting ways and see how it feels.
So, this does sound like open-ended recursive self-modification (which, under reasonable assumptions, does imply recursive self-improvement, but without narrowly minded focus to squeeze as much “movement to a well-defined goal as possible”).
So, yes, safety problems here are formidable, one has to explicitly address all issues associated with open-ended recursive self-modification.
I wrote a relevant shortform about Elon here. TLDR: I’m not quite convinced Elon Musk has actually read any one of the Sequences.