Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

The Next Generation of Compute Scale

AI development is on the cusp of a dramatic expansion in compute scale. Recent developments across multiple fronts—from chip manufacturing to power infrastructure—point to a future where AI models may dwarf today’s largest systems. In this story, we examine key developments and their implications for the future of AI compute.

xAI and Tesla are building massive AI clusters. Elon Musk’s xAI has brought its Memphis supercluster—“Colossus”—online. According to Musk, the cluster has 100k Nvidia H100s, making it the largest supercomputer in the world. Moreover, xAI plans to add 50k H200s in the next few months. For comparison, Meta’s Llama 3 was trained on 16k H100s.

Meanwhile, Tesla’s “Gigafactory Texas” is expanding to house an AI supercluster. Tesla’s Gigafactory supercomputer is expected to initially draw 130MW, with potential growth to 500MW. One Megawatt is roughly enough to power 1,000 homes in the US, so this level of power consumption begins to match that of a large city.

OpenAI plans a global AI infrastructure push. CEO of OpenAI Sam Altman is reportedly concerned that xAI will have more access to computing power than OpenAI. OpenAI uses Microsoft’s compute resources, but recent reports have indicated that OpenAI plans its own infrastructure buildout.

According to Bloomberg, Sam Altman is spearheading a massive buildout of AI infrastructure, beginning with projects in several U.S. states. This initiative aims to form a global investor coalition to fund the physical infrastructure necessary for rapid AI development.

The scope of these projects is broad, encompassing the construction of data centers, expansion of energy capacity, and growth of semiconductor manufacturing capabilities. Potential investors include entities from Canada, Korea, Japan, and the United Arab Emirates.

This infrastructure push is happening alongside OpenAI’s approach towards a new funding round that includes Apple and Nvidia, and could push the company’s valuation beyond $100 billion.

These developments at OpenAI and xAI are not surprising—rather, they are representative of the broader trend towards ever larger compute scale. For example, North Dakota was reportedly approached by two separate companies about developing $125 billion clusters in the state.

TSMC starts production in Arizona, and Intel considers splitting out its foundry business. TSMC began trial chip production in its Arizona facility, and its yields are reportedly on par with facilities in Taiwan. The success put the US on track to meet its targets for domestic semiconductor production, and TSMC on track to receive $6.6 billion in grants and as much as $5 billion in loans from the US as a part of the CHIPS and Science Act.

*TSMCs Arizona facility during its construction.* *Photo source.*

The picture is more complicated for Intel. Intel’s foundry business is supposed to receive approximately $8.5 billion under the CHIPS and Science Act, but it’s already spending billions to qualify—in the second quarter, it reported a loss of $2.8 billion.

The chipmaker has reportedly had difficulty receiving funds from the CHIPS act, and now faces a strategic crossroads. A US-based chip foundry is a national strategic priority, and investors might look to Intel to hedge against the geopolitical uncertainty of reliance on TSMC in light of China’s claims to Taiwan. However, Intel’s foundry investments are dragging down its otherwise profitable microprocessors business.

In response, Intel is reportedly considering splitting out its foundry business. The move might return the company to profitability, while at the same time setting up a possible domestic competitor to TSMC.

Ranking Models by Susceptibility to Jailbreaking

On September 7th, the “AI safety and security” company Gray Swan kicked off a competition to jailbreak LLMs. The competition includes models from Anthropic, OpenAI, Google, Meta, Microsoft, Alibaba, Mistral, Cohere, and Gray Swan AI.

As of the writing of this story, the competition is ongoing. It is set to end when all models have been jailbroken (successfully prompted to give a specified harmful output) by at least one person. Every model has been jailbroken except for Grey Swan’s, which have so far resisted over ten thousand manual jailbreaking attempts. The competition’s model leaderboard lists how the rest of the models compare.

This is good evidence that the problem of making LLMs robust to malicious use is more tractable than previously thought. In particular, the safety techniques employed by Gray Swan, including “circuit breaking” and other representation engineering techniques.

However, there are also important limitations to what we can infer from this competition. First, competitors are allowed only one prompt at a time to jailbreak a model. Extended, multi-prompt conversations will likely jailbreak some models that can resist single-prompt attacks. Second, the competitors do not have access to the model’s weights. Open-weight models are subject to much stronger forms of adversarial attacks, such as fine-tuning.

Machine Ethics

Our new book, Introduction to AI Safety, Ethics and Society, is available for free online and will be published by Taylor & Francis in the next year. This week, we will look at Chapter 6: Beneficial AI and Machine Ethics. This chapter looks at the challenge of embedding beneficial and ethical goals in AI systems. The video lecture for this chapter is here.

Lawful AI. One proposal for guiding AI behavior is to ensure an AI agent adheres to existing law. Law has several advantages: it is arguably legitimately formed (at least in democracies), time-tested, and comprehensive in scope.

However, law also has several disadvantages: it is often written without AIs in mind. For example, much of criminal law requires mental states and intent, which do not necessarily apply to AIs. For example, the implementation act of the bioweapons convention discusses “knowingly” aiding terrorists; if an AI gives bioweapon instructions to a terrorist, it is not necessarily doing so knowingly, and neither do people the AI developers, so nothing gets penalized. Law is also intentionally silent on many important issues, so provides a limited set of guardrails.

Fair AI. Beneficial AIs should also ideally prioritize fairness. Unfair bias can enter the behavior of AI systems in many ways—for example, though through flawed training data. Bias in AIs is hazardous because it can generate feedback loops: AI systems trained on flawed data could make biased decisions that are then fed into future models.

Improving the fairness of AI systems involves combining technical approaches like adversarial testing and sociotechnical solutions like participatory design, in which all stakeholders are involved in a system’s development.

Economically beneficial AI. Another proposal is that AI behavior should be guided by market forces, since capitalism incentivizes AIs that increase economic growth (think e/acc). However, while economic growth is a worthy goal, it has limitations like market failures.

Moral uncertainty. AIs should be able to make decisions under moral uncertainty, or situations in which there are conflicting moral considerations. There are several potential solutions to moral uncertainty.

First, an AI could use a “favored theory” at the expense of all others, but while simple, this could lead to single-mindedness and overconfidence. An AI could maximize the product of an option’s desirability and how likely its corresponding theory is true, but while this approach is more balanced, ranking theories by credence is inherently subjective. Finally, an AI could use a “moral parliament” in which hypothetical delegates from different theories debate and come to a compromise.

Government

The California Legislature passed SB 1047. The bill is headed to Governor Newsom’s desk.
The Bureau of Industry and Security announced mandatory reporting requirements for AI developers.
The US, EU, and UK signed the first legally binding international treaty on the use of AI.
The Beijing Institute of Safety and Governance launched.
OpenAI and Anthropic agree to provide the US AISI early access to new models.

Technology

OpenAI has reportedly demonstrated “Strawberry” to national security officials, and is using the breakthrough to help train its next flagship system, “Orion.” Strawberry is reported to be released within the next two weeks.
Ilya Sutskever’s three-month-old AI company, Safe Superintelligence (SSI), has raised $1 billion in cash, and is reportedly valued at $5 billion.
Sakana AI raised $100 million in a Series A funding round.
Amazon CEO Andy Jassy claims that the company’s AI software assistant has saved 4,500 developer-years of work.
Bloomberg reported on the effects that AI is having on the Philippines’ outsourcing industry.
AI developer Magic trained models to reason on up to 100 million tokens.
To raise awareness of advances in AI forecasting technology and increase its rate of adoption, CAIS released a demo of a forecasting bot.

Opinion

Some experts also argue that SB 1047 could enhance EU AI regulation.
A long list of academics signed a letter in support of SB 1047. So did over 100 employees of frontier AI labs. Other groups like SAG AFTRA (labor union) also endorsed SB 1047.

The Center for AI Safety is also hiring for several positions, including Chief Operating Officer, Director of Communications, Federal Policy Lead, and Special Projects Lead.

Double your impact! Every dollar you donate to the Center for AI Safety will be matched 1:1 up to $2 million. Donate here.

AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics

The Next Generation of Compute Scale

Ranking Models by Susceptibility to Jailbreaking

Machine Ethics