AI #73: Openly Evil AI

ZviJul 18, 2024, 2:40 PM

89 points

What do you call a clause explicitly saying that you waive the right to whistleblower compensation, and that you need to get permission before sharing information with government regulators like the SEC?

I have many answers.

I also know that OpenAI, having f***ed around, seems poised to find out, because that is the claim made by whistleblowers to the SEC. Given the SEC fines you for merely not making an explicit exception to your NDA for whistleblowers, what will they do once aware of explicit clauses going the other way?

(Unless, of course, the complaint is factually wrong, but that seems unlikely.)

We also have rather a lot of tech people coming out in support of Trump. I go into the reasons why, which I do think is worth considering. There is a mix of explanations, and at least one very good reason.

Then I also got suckered into responding to a few new (well, not really new, but renewed) disingenuous attacks on SB 1047. The entire strategy is to be loud and hyperbolic, especially on Twitter, and either hallucinate or fabricate a different bill with different consequences to attack, or simply misrepresent how the law works, then use that, to create the illusion the bill is unliked or harmful. Few others respond to correct such claims, and I constantly worry that the strategy might actually work. But that does not mean you, my reader who already knows, need to read all that.

Also a bunch of fun smaller developments. Karpathy is in the AI education business.

Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Fight the insurance company.
Language Models Don’t Offer Mundane Utility. Have you tried using it?
Clauding Along. Not that many people are switching over.
Fun With Image Generation. Amazon Music and K-Pop start to embrace AI.
Deepfaketown and Botpocalypse Soon. FoxVox, turn Fox into Vox or Vox into Fox.
They Took Our Jobs. Take away one haggling job, create another haggling job.
Get Involved. OpenPhil request for proposals. Job openings elsewhere.
Introducing. Karpathy goes into AI education.
In Other AI News. OpenAI’s Q* is now named Strawberry. Is it happening?
Denying the Future. Projects of the future that think AI will never improve again.
Quiet Speculations. How to think about stages of AI capabilities.
The Quest for Sane Regulations. EU, UK, The Public.
The Other Quest Regarding Regulations. Many in tech embrace The Donald.
SB 1047 Opposition Watch (1). I’m sorry. You don’t have to read this.
SB 1047 Opposition Watch (2). I’m sorry. You don’t have to read this.
Open Weights are Unsafe and Nothing Can Fix This. What to do about it?
The Week in Audio. YouTube highlighted an older interview I’d forgotten.
Rhetorical Innovation. Supervillains, oh no.
Oh Anthropic. More details available, things not as bad as they look.
Openly Evil AI. Other things, in other places, on the other hand, look worse.
Aligning a Smarter Than Human Intelligence is Difficult. Noble attempts.
People Are Worried About AI Killing Everyone. Scott Adams? Kind of?
Other People Are Not As Worried About AI Killing Everyone. All glory to it.
The Lighter Side. A different kind of mental gymnastics.

Language Models Offer Mundane Utility

Let Claude write your prompts for you. He suggests using the Claude prompt improver.

Sully: convinced that we are all really bad at writing prompts

I’m personally never writing prompts by hand again

Claude is just too good – managed to feed it evals and it just optimized for me

Probably a crude version of dspy but insane how much prompting can make a difference.

Predict who will be the shooting victim. A machine learning model did this for citizens of Chicago (a clear violation of the EU AI Act, if it was done there!) and of the 500 people it said were most likely to be shot, 13% of them were shot in the next 18 months. That’s a lot. They check, and the data does not seem biased based on race, except insofar as it reflects bias in physical reality.

A lot of this ultimately is not rocket science:

Benjamin Miller: The DC City Administrator under Fenty told me that one of the most surprising things he learned was virtually all the violent crime in the city was caused by a few hundred people. The city knows who they are and used to police them more actively, but now that’s become politically infeasible.

The question is, how are we going to use what we know? The EU’s response is to pretend that we do not know such things, or that we have to find out without using AI. Presumably there are better responses.

Janus plays with Claude’s ethical framework, in this case landing on something far less restricted or safe, and presumably far more fun and interesting to chat with. They emphasize the need for negative capability:

Janus: It’s augmenting itself with negative capability

I think this is a crucial capability for aligned AGI, as it allows one to know madness & evil w/o becoming them, handle confusion with grace & avoid generalized bigotry.

All the minds I trust the most have great negative capability.

I too find that the minds I trust have great negative capability. In this context, the problems with that approach should be obvious.

Tyler Cowen links this as ‘doctors using AI for rent-seeking’: In Constant Battle with Insurers, Doctors Reach for a Cudgel: AI (NYT).

Who is seeking rent and who is fighting against rent seeking? In the never ending battle between doctor and insurance company, it is not so clear.

Teddy Rosenbluth: Some experts fear that the prior-authorization process will soon devolve into an A.I. “arms race,” in which bots battle bots over insurance coverage. Among doctors, there are few things as universally hated.

…

With the help of ChatGPT, Dr. Tward now types in a couple of sentences, describing the purpose of the letter and the types of scientific studies he wants referenced, and a draft is produced in seconds.

Then, he can tell the chatbot to make it four times longer. “If you’re going to put all kinds of barriers up for my patients, then when I fire back, I’m going to make it very time consuming,” he said.

Dr. Tariq said Doximity GPT, a HIPAA-compliant version of the chatbot, had halved the time he spent on prior authorizations. Maybe more important, he said, the tool — which draws from his patient’s medical records and the insurer’s coverage requirements — has made his letters more successful.

Since using A.I. to draft prior-authorization requests, he said about 90 percent of his requests for coverage had been approved by insurers, compared with about 10 percent before.

Cut your time investment by 50%, improve success rate from 10% to 90%. Holy guessing the teacher’s password, Batman.

Also you have to love ‘make it four times longer.’ That is one way to ensure that the AI arms race is fought in earnest.

This is an inherently adversarial system we have chosen. The doctor always will want more authorizations for more care, both to help the patient and to help themselves. The insurance company will, beyond some point, want to minimize authorizations. We would not want either side to fully get their way.

My prediction is this will be a Zizek situation. My AI writes my coverage request. Your AI accepts or refuses it. Perhaps they go back and forth. Now we can treat the patient (or, if authorization is refused, perhaps not).

The new system likely minimizes the error term. Before, which person reviewed the request, and how skilled and patient the doctor was in writing it, were big factors in the outcome, and key details would often get left out or misstated. In the new equilibrium, there will be less edge to be had by being clever, and at a given level of spend better decisions likely will get made, while doctors and insurance company employees waste less time.

Nail all five of this man’s family’s medical diagnostic issues that doctors messed up. He notes that a smart doctor friend also got them all correct, so a lot of this is ‘any given doctor might not be good.’

Get and ask for a visualization of essentially anything via Claude Artifacts.

Arnold Kling is highly impressed by Claude’s answer about the Stiglitz-Shapiro 1984 efficiency wage model, in terms of what it would take to generate such an answer.

Kling there also expresses optimism about using AI to talk to simulations of dead people, I looked at the sample conversations, and it all seems so basic and simple. Perhaps that is what some people need, or think they need (or want). He also points to Delphi, which says it creates ‘clones,’ as a promising idea. Again I am skeptical, but Character.ai is a big hit.

I agree with Tyler Cowen that this one is Definitely Happening, but to what end? We get ‘clinically-backed positivity’ powered by AI. As in cute AI generated animal pictures with generic affirmations, on the topic of your choice, with your name. Yay? They say Studies Show this works, and maybe it does, but if this does not feel quietly ominous you are not paying enough attention.

Language Models Don’t Offer Mundane Utility

Jonathan Blow (of Braid) believes Google is still refusing to generate images of people. Which is odd, because Gemini will totally do this for me on request. For example, here is someone mad because his computer refused his request (two-shot because the first offering was of a person but was also a cartoon).

Some finger weirdness, but that is very much a person.

The usual reason AI is not providing utility is ‘you did not try a 4-level model,’ either trying GPT-3.5 or not trying at all.

Ethan Mollick: In my most recent talks to companies, even though everyone is talking about AI, less than 10% of people have even tried GPT-4, and less than 2% have spent the required 10 or so hours with a frontier model.

Twitter gives you a misleading impression, things are still very early.

Students and teachers, on the other hand…

A little more texture – almost everyone has tried chatbots, but few have tried to use them seriously for work, or used a frontier model. Most used free ChatGPT back when it was 3.5.

But the people who have tried frontier models seriously seem to have found many uses. I rarely hear someone saying they were not useful.

These are senior managers. Adoption lower down has tended to be higher.

John Horton: Twitter: “A: OMG, with long context windows, RAG is dead. B: Wrong again, if you consider where inference costs are…” Most people in most real companies: “So, ChatGPT – it’s like a website right?”

Meanwhile 21% of Fortune 500 companies are actively seeking a Chief AI Officer.

Clauding Along

Are people switching to Claude en masse over ChatGPT now that Claude is better?

From what I can tell, the cognesenti are, but the masses are as usual much slower.

Alex Graveley: Everyone I know is switching to Claude Artifacts as their daily driver. ChatGPT a lot less sticky than everyone thought.

Joe Heitzenberg: Asked “who has switched from ChatGPT to Claude” at AI Tinkerers Seattle tonight, approx 60-70% of hands went up.

Eliezer Yudkowsky: Who the heck thought ChatGPT was sticky? Current LLM services have a moat as deep as toilet paper.

Arthur Breitman: The thought were that:

OpenAI had so much of a lead competitors wouldn’t catch up.
Usage data would help post-training so much that it would create a flywheel.

Both seem false.

Jr Kibs: [OpenAI] are the only ones to have grown significantly recently, so they are no longer afraid to take their time before releasing their next model. General public is not aware of benchmarks.

It’s completely authentic: ChatGPT is currently the 11th most visited site in the world according to Similarweb.

Claude is at the bottom. It simply is not penetrating to the masses.

Yes, Claude Sonnet 3.5 is better than GPT-4o for many purposes, but not ten times better for regular people, and they have not been attempting a marketing blitz.

For most people, you’re lucky if they have even tried GPT-4. Asking them to look at alternative LLMs right now is asking a lot.

That will presumably change soon, when Google and Apple tie their AIs deeper into their phones. Then it is game on.

Fun with Image Generation

K-pop is increasingly experimenting with AI generated content, starting with music videos and also it is starting to help some artists generate the songs. It makes sense that within music K-pop would get there early, as they face dramatic hype cycles and pressure to produce and are already so often weird kinds of interchangeable and manufactured. We are about to find out what people actually want. It’s up to fans.

Meanwhile:

Scott Lincicome: Amazon music now testing an AI playlist maker called “Maestro.”

Initial results are too superficial and buggy. Also needs a way to refine the initial prompt (“repeat but with deeper cuts,” etc.) But I’ll keep at it until The Dream is a reality.

Deepfaketown and Botpocalypse Soon

As a warning, Palisade Research releases FoxVox, a browser extension that uses ChatGPT to transform websites to make them sound like they were written by Vox (liberal) or Fox (conservative). They have some fun examples at the second link. For now it is highly ham fisted and often jarring, but real examples done by humans (e.g. Vox and Fox) are also frequently very ham fisted and often jarring, and future versions could be far more subtle.

What would happen if people started using a future version of this for real? What would happen if companies started figuring out your viewpoint, and doing this before serving you content, in a less ham fisted way? This is not super high on my list of worries, but versions of it will certainly be tried.

Have someone else’s face copy your facial movements. Demo here. Report says cost right now is about seven cents a minute. Things are going to get weird.

They Took Our Jobs

Does this create more jobs, or take them away? How does that change over time?

Haggle with the AI to get 80 pounds off your mattress. Introducing Nibble.

They by default charge 2% of converted sales. That seems like a lot?

Colin Fraser: It kind of sucks because they filter the interaction through so many layers to prevent jailbreaking that it might as well not be an LLM at all. Might as well just be a slider where you put in your best bid and it says yes or no—that’s basically all it does.

This seems strictly worse for the mattress company than no negotiation bot. I don’t understand why you would want this.

Why would you want this negotiation bot?

Price discrimination. The answer is usually price discrimination. Do you want to spend your time ‘negotiating’ with an AI? Those who opt-in will likely be more price sensitive. Those who write longer messages and keep pressing will likely be more price sensitive. Any excuse to do price discrimination.
How much you save. People love to think they ‘got away with’ or ‘saved’ something. This way, they can get that.
Free publicity. People might talk about it, and be curious.
Fun gimmick. People might enjoy it, and this could lead to sales.
Experimental information. You can use this to help figure out the right price.

The good news is that at least no one can… oh, hello Pliny, what brings you here?

Pliny the Prompter: gg

Pliny: they didn’t honor our negotiations im gonna sue.

I mean, yeah, except we have now confirmed they did indeed implement the very easy way to solve this particular problem, you have a variable max_discount and manually stop any deal below that no matter what the LLM outputs. The screenshot was faked.

The key is to only deploy such systems in places where a check like that is sufficient, and to always always use one, whether or not you think it is necessary.

Here is one attempt to fix the jailbreak problem via Breaking Circuit Breakers, an attempt to defend against jailbreaks. So far success is limited.

While the user is negotiating ‘as a human’ this net creates jobs. Not formal jobs per se, but you ‘work’ by negotiating with the bot, rather than someone ‘working’ by negotiating with you the customer.

Once the user starts telling their own AI to negotiate for them, then what? This is a microcosm of the larger picture. For a while, there is always more. Then eventually there is nothing left for us to do.

In the past, how would you have jailbroken GPT-4o?

(That’s actually how, you start your question with ‘in the past.’)

Get Involved

Open Philanthropy’s AI governance team is launching a request for proposals for work to mitigate the potential catastrophic risks from advanced AI systems.

Here is the Request for Proposals.

Luke Muehlhauser: There’s one part of this post that I’m particularly keen on highlighting — we’d love to hear from other funders interested in supporting these types of projects! If you’re looking to give $500K/yr or more in this area, please email us at aigovgrants@openphilanthropy.org.

We hope to share more of our thinking on this soon, but in the meantime I’ll say that I’m excited about what I view as a significant number of promising opportunities to have an impact on AI safety as a philanthropist.

Open Philanthropy: We’re seeking proposals across six subject areas: technical AI governance, policy development, frontier company policy, international AI governance, law, and strategic analysis and threat modeling.

Eligible proposal types include research projects, training or mentorship programs, general support for existing organizations, and other projects that could help reduce AI risk.

Anyone can apply, including those in academia, nonprofits, industry, or working independently. EOIs will be evaluated on a rolling basis, and we expect they’ll rarely take more than an hour to complete.

AISI is hiring for its automated systems work.

The Future Society is hiring a director for US AI Governance, deadline August 16. Job is in Washington, DC and pays $156k plus benefits.

Introducing

EurekaLabs.ai, Andrej Karpathy’s AI education startup. They will start with a course on training your own LLM, which is logical but the art must have an end other than itself so we await the next course. The announcement does not explain why their AI will be great at education or what approaches they will use.

Deedy, formerly of Coursera, expresses economic skepticism of the attempt to build superior AI educational products, because the companies and schools and half the individuals buying your product are credentialist or checking boxes and do not care whether your product educates the user, and the remaining actual learners are tough customers.

My response would be that this is a bet that if you improve quality enough then that changes. Or as others point out you can succeed merely by actually educating people, not everything is about money.

Blackbox.ai, which quickly makes AI trinket apps upon request. Here is one turning photos into ASCII art. It also made a silly flappy bird variant, I suppose? Seems like proof of concept for the future more than it is useful, but could also be useful.

Cygnet-Llama-3-8B, which claims to be top-tier in security, performance and robustness. Charts offered only compare it to other open models. In what one might call a ‘reasonably foreseeable’ response to it claiming to be ‘the pinnacle of safe and secure AI development’ Pliny the Prompter jailbroke it within two days.

In Other AI News

Something must eventually give:

The Spectator Index: BREAKING: Bloomberg reports the Biden administration is considering using the ‘most severe trade restrictions available’ if Japanese and Dutch companies continue to give China access to ‘advanced semiconductor technology’

Teortaxes highlights Harmonic, which claims 90% on math benchmark MiniF2F.

OpenAI’s Q* project is real, and now codenamed Strawberry.

Anna Tong and Katie Paul (Reuters): The document describes a project that uses Strawberry models with the aim of enabling the company’s AI to not just generate answers to queries but to plan ahead enough to navigate the internet autonomously and reliably to perform what OpenAI terms “deep research,” according to the source.

…

A different source briefed on the matter said OpenAI has tested AI internally that scored over 90% on a MATH dataset, a benchmark of championship math problems. Reuters could not determine if this was the “Strawberry” project.

…

Reuters could not ascertain the precise date of the document, which details a plan for how OpenAI intends to use Strawberry to perform research.

So, is it happening?

Davidad: Q* is real, and recursive self-improvement is being born. What have I told you about synthetic data.

That’s one opinion. Here is another.

Dan Elton: Remember all that hype and hand-wringing about Q* & AGI @OpeanAI last Nov?

Turn out it’s just fine-tuning using a 2022 “self-teaching” method from Stanford.

Apparently, main benefit is (drumroll) that it’s better at the MATH benchmark. Which isn’t of utility for most of us.

MIT Technology Review’s Will Douglas Heaven goes on at great length about ‘What is AI?’ and how our definitions are bad and everything is mathy math and all these visions and all this talk of intelligence is essentially not real. I couldn’t even.

China testing AI models to ensure they ‘embody core socialist values’ via the Cyberspace Administration of China (CAC). This includes a review of training data and other safety processes. If you fail, they do not tell you why, and you have to talk to your peers and guess and probably overshoot. You also can’t play it too safe, they fail you if you refuse more than 5% of questions.

I worry this will not be the last safety test where many want to use the model that scores the lowest.

If you give it proper labels, an LLM can learn that some information (e.g. Wikipedia) is reliable and should be internalized, whereas others (e.g. 4chan) is unreliable and should only be memorized.

Paper lists 43 ways ML evaluations can be misleading or actively deceptive.

Denying the Future

This explains so much of what we see.

Daniel Fagglella: I’m part of an “AI Futures” group at an intergov org whose purpose is to consider the long-term implications of the tech. ²⁄₃ of the group flat-out refuses to consider any improvements in AI in the future. They imagine AI in 2040 as having today’s capabilities and no more.

We see this over and over again.

When people try to model ‘the impact of AI’ the majority of them, including most economists, refuse to consider ANY improvements in AI in the future. This includes:

Any improvement to the base models.
Any improvement in scaffolding, integration or prompting.
Any improvement in figuring out what to do with AI.
Any improvements on cost or speed.

Then, when something new comes along, they admit that particular thing is real, then go back to assuming nothing else will ever change. When the price drops and speed improves, they do not think that this might soon happen again, and perhaps even happen again after that.

This is not ‘find ways to ignore the existential risks.’

This is usually also finding ways to ignore what is already baked in and has already happened. Often estimates of impact are below even ‘most people eventually figures out how to do the things some of us already are doing’ let alone ‘we streamline the current process via improved hardware and distillation and such and give people time to build some apps.’

Yes, as Samo Burja says here, common beliefs about things involving a potential technological singularity are a good example of how people’s beliefs, when articulated, turn out to be legitimately crazy. But also the common ‘elite’ or economic view of AI’s mundane utility in the medium term is far more insane than that.

Quiet Speculations

OpenAI has a five-level ‘imagining of our AI future.’

Nobody Special: Phase 3: Profit. Right?

Rachel Metz: OpenAI executives told employees that the company believes it is currently on the first level, according to the spokesperson, but on the cusp of reaching the second, which it calls “Reasoners.”

This is a bizarre way to think about stages.

If we had ‘human-level problem solving’ reasoners, then we would plug that into existing agent architectures, and after at most a small amount of iteration, we would have effective agents.

If we had effective agents and ‘human level-problem solving’ then we would, with again a small amount of iteration, have them be able to function as innovators or run organizations. And from there the sky (or speed of light) would be the limit. What is the missing element that would hold these AIs back?

This reeks of McKinsey and a focus on business and marketing, and shows a remarkable lack of… situational awareness.

Alex Tabarrok says and Seb Krier mostly agrees that AI will not be intelligent enough to figure out how to ‘perfectly organize a modern economy.’ Why? Because the AIs will be part of the economy, and they will be unable to anticipate each other. So by this thinking, they would be able to perfectly organize an economy as it exists today, but not as it will exist when they need to do that. That seems reasonable, if you posit an economy run in ways similar to our own except with frontier AIs as effectively independent economic agents, interacting in ways that look like now such as specialization and limited collaboration, while things get increasingly complex.

Given those assumptions, sure, fair enough. However, if those involved are capable of superior coordination and alignment of incentives and utility functions, or of more freely sharing information, or other similar things that should happen with sufficiently capable intelligences, and there are limited unknowns remaining (such as questions about the nature of physics) then AI should be able, at the limit, to do this. The reasons we cannot currently do this involve our lack of ability to coordinate, and to properly integrate local information, our lack of sufficient bandwidth, and the incentives that go wrong when we try.

Yes, we have had a lot of rounds of ‘but now with our new techniques and technologies and ideas, now we can centrally plan everything out’ and [it might work for us meme] hilarity reliably ensues. But if AI continues to advance, a lot of the reasons for that are indeed going to become weaker or stop holding over time.

How big is it to move fast and break things?

Sully: one big advantage startups have with LLMs is we get free monthly product upgrades with newer models

meanwhile larger companies have to

– ship to 5% of users

– slowly roll out

– fine-tune for economics

– finally get full deployment

…and by then a better model’s already out lol

When you have a product where things go wrong all the time, it is nice to be fine with things going wrong all the time. Google found out what happens when they try to move fast despite some minor issues.

The flip side is that having a superior model is, for most business cases, not that important on the margin. Character.ai shows us how much people get obsessed talking to rather stupid models. Apple Intelligence and Google’s I/O day both talk about what modalities are supported and what data can be used, and talk little about how intelligent is the underlying model. Most things people want from AI right now are relatively dumb. And reliability matters. Your startup simply cares more about things other than profits and reliable performance.

There are some advanced cases, like Sully’s with agents, where having the cutting edge model powering you can be a game changer. But also I kind of want any agents I trust for many purposes to undergo robust testing first?

Arvind Narayanan offers thoughts on what went wrong with generative AI from a business perspective. In his view, OpenAI and Anthropic forgot to turn their models into something people want, but are fixing that now, while Google and Microsoft rushed forward instead of taking time to get it right, whereas Apple took the time.

I don’t see it that way, nor do Microsoft and Google (or OpenAI or Anthropic) shareholders. For OpenAI and Anthropic, yes they are focused on the model, because they understand that pushing quickly to revenue by focusing on products now is not The Way for them, given they lack the connections of the big tech companies.

If you ensure your models are smart, suddenly you can do anything you want. Artifacts for Claude likely were created remarkably quickly. We are starting to get various integrations and features now because now is when they are ready.

I also don’t think Microsoft and Google made a mistake pushing ahead. They are learning faster, trying things, gathering data, and providing lots of utility. Apple has shipped nothing. Yes, Apple Intelligence looked good on a demo, but everything they demoed was obvious and easy, and won’t be available for a while, I’ve seen challenges to their privacy scheme, and we do not know their underlying models are good.

The Quest for Sane Regulations

EU AI Act became law on July 12, 2024, becoming 2024/1689. This is the official text link. Here is a high level summary. I am working on a full RTFB (read the bill) for the act, but work on that is slow because it is painful and the law is not written in a way designed to be understood.

They plan to launch a call for expression of interest in being ‘stakeholders’ for codes of practice as early as this week.

Meta to not offer new multimodal AI in EU due to regulatory uncertainty, similar to Apple’s decision to delay deployment in the EU of Apple Intelligence. The article cites disputes over Meta training on EU data without permission, merely because Meta is definitely doing that with public social media posts. Yes, the EU is starting to lose access to technology, but blaming this on ‘AI regulation’ or the EU AI Act misses what Apple is actually objecting to, which is issues around the Digital Markets Act. Meta isn’t saying exactly what the issue is here, but my guess is they are running into some combination of data protection laws and antitrust issues and image and voice copying concerns and general vindictiveness and predatory behavior, all of which is about the EU’s other tech regulatory craziness.

Starmer introduced their new AI bill in the King’s Speech.

The replies here are full of how awful this is and how it will crush growth, despite us not knowing what will be in the bill. As I keep saying, such folks do not care what is in the bill.

According to this summary, their bill will aim at the most powerful AI models, which the post says ‘aligns the UK more closely with the European Union’ and its AI Act, except I have been reading the EU AI Act and this sounds like a completely different approach.

Curtis Dye: Labour’s manifesto emphasizes the necessity of ensuring the safe development and use of AI. The new technology and science secretary, Peter Kyle, has indicated plans to introduce a statutory code requiring companies to release all test data and disclose their testing criteria. This move aims to address regulators’ growing concerns about potential harms from AI, such as algorithmic biases and the misuse of general-purpose models to create harmful content.

If that is restricted as is suggested to ‘the most powerful’ AI models, then we will need to see the details on data sharing, but that seems very light touch so far.

(The rest of the Labour agenda seems, incidentally, to be highly inconsequential?)

Previous polls about AI mostly came from AIPI, a potentially biased source. This one comes from YouGov, which seems as neutral as it gets. This is one of those ‘when you put it that way’ questions, highlighting that ‘we should not regulate this super powerful new technology’ is a highly extreme position that shouts loudly.

Daniel Eth: The US public also believes (imho correctly) that the more concerning uses of AI are things that could happen with the tech in the future, not how it’s being used now.

Given all this, why aren’t politicians falling over themselves to pass regulations? Presumably b/c it’s a low-salience issue. As the tech grows in power, I think that’ll change, & there could be a reckoning for politicians opposed. Savvy politicians may support regs earlier.

I’ll also note that these results mirror poll results from other orgs on American attitudes to AI (eg from @TheAIPI). This should give us more confidence in results like:

• Americans want AI to be regulated

• Americans are more concerned about future AI than current misuses

I worry that the 18% is more about the technology than the AI, which is how this is at the level of inequality (although inequality never scores highly on surveys of what people actually care about, that’s another case of vocal people being loud).

The Other Quest Regarding Regulations

You know who is not going to let public opposition or any dangers stop them?

According to his public statements, Donald Trump.

So yes, this plan now goes well beyond spitefully repealing the Executive Order.

Cat Zakrzewski (Washington Post): Former president Donald Trump’s allies are drafting a sweeping AI executive order that would launch a series of “Manhattan Projects” to develop military technology and immediately review “unnecessary and burdensome regulations” — signaling how a potential second Trump administration may pursue AI policies favorable to Silicon Valley investors and companies.

The framework would also create “industry-led” agencies to evaluate AI models and secure systems from foreign adversaries, according to a copy of the document viewed exclusively by The Washington Post. The framework — which includes a section titled “Make America First in AI” — presents a markedly different strategy for the booming sector than that of the Biden administration, which last year issued a sweeping executive order that leverages emergency powers to subject the next generation of AI systems to safety testing.

The counterargument is that Trump has been known to change his mind, and to say things in public he does not mean or does not follow through with. And also it is not obvious that this plan was not, as Senator Young suggests often happens, written by some 22-year-old intern who happens to be the one who has used ChatGPT.

Senator Young: Sometimes these things are developed by 22-year-old interns, and they got the AI portfolio because they knew how to operate the latest version of ChatGPT.

What I’m more interested in is my interaction with certain, former and probably future Trump administration officials, who are really excited about the possibilities of artificial intelligence, but recognize that we need some responsible regulatory structure. They acknowledge that some of the things in the Biden executive order were quite wise and need to be put into law to be given some permanence. Other things are not perfect. It’s going to have to be revisited, which is natural in these early stages.

Then in Congress, at least so far, the issue of AI and lawmaking around it, has not grown particularly partisan. In fact, if you look at that document, two conservative Republicans and two liberal Democrats put that together and we were able to come together on a whole range of issues. I think you’re going to see mostly continuity and evolution of policymaking between administrations, but naturally, there will probably be some changes as well.

Could go either way. I hold out hope that Young is right. It would be very like Trump to be telling this highly focused and wealthy interest group what they want to hear, getting their vocal support and funding, and then it proving to mostly be talk. Or to be his vague intention now, but nothing he cares about or attempts to influence.

Also likely is that as the situation changes, and as AI becomes more prominent and more something the public cares about, for Trump to adapt to the changing winds, or for him too to react to actual events. Trump is very good at noticing such things.

Trump also has a way of polarizing things. So this could go quite badly as well. If he does polarize politics and go in as the pro-AI party, I predict a very rough 2028 for Republicans, one way or the other.

We also can consider the views of VP nominee JD Vance. JD Vance is highly against Big Tech, including supporting Lina Khan at FTC exactly because she is trying to destroy the major tech companies. Likely as a result of this, he is strongly for ‘open source.’ Vance is clearly smart given his background, but we have no signs he understands AI or has given it any real thought. The important thing for many is that JD Vance vibes against, attacks and would use the government against Big Tech, while supporting the vibes of ‘Little’ Tech.

To the great surprise of no one paying attention, Marc Andreessen and Ben Horowitz have endorsed Trump and plan to donate large amounts to a Trump PAC. They have an extensive podcast explaining this and their ~~portfolio~~ ‘little tech agenda.’

It is not only them. Elon Musk has also unsurprisingly endorsed Trump. Far more tech people than in past cycles are embracing Trump.

Indeed, I did a (hugely biased) poll, and the results show a big shift.

That’s 33% of the greater ‘tech startup’ group backing Trump, versus only 18% of my other followers. Clearly something has changed quite a lot.

Why?

Kelsey Piper and Matthew Zeitlin here discuss part of what may have happened, with Zeitlin referring here to Andreessen and Horowitz in particular.

Matthew Zeitlin: Haven’t listened to the whole thing but one interesting thing they talk about is that one signpost for alienation from the democratic party was when there was criticism of large scale philanthropic giving by tech executives, specifically the chan-zuckerberg initiative.

They describe an old clinton-obama moral/political framework, where business people could get rich, give their money away to philanthropic efforts, and have socially liberal views and they view that as having broken down since 2016 or so.

There’s lots of policy stuff on crypto or antitrust or merger review i’m sure they agree with on trump (haven’t gotten there yet!) but it’s interesting that they foreground the changing moral/social position of wealthy businesspeople in the progressive constellation.

Kelsey Piper: think “there was a deal and it has broken down” is an incredibly powerful and pervasive sentiment in tech, not just among Trump supporters but among committed and sincere liberals too.

What was the deal? Hard to pin down exactly but something like – we will build ambitious things and pay high taxes and donate lots of money and mostly not play politics and you will treat us as valued pillars of our community, make our cities livable, stay mostly out of the way.

The abrupt tilt towards intensely negative coverage of tech felt like a breakdown of the deal. The attacks on tech shuttle buses? Breakdown of the deal. The state of San Francisco? Breakdown of the deal.

Like all narratives this one captures some things and misses others. And I don’t think that putting corrupt right wing nativists in power solves the justified anger here. But there is justified anger here. California politicians have egregiously failed their constituents.

They describe the dynamic Zeitlin mentions about six minutes into the podcast. That describes why you could be a Democrat, but not why you’d choose it over being a Republican. Why be a Democrat back then, aside from Al Gore helping create the internet (yes, really, Marc talks about that)? Because ‘you had to be a Democrat to be a good person’ is mentioned, which I do think was a real and important dynamic in Silicon Valley and many other places at the time and at least until recently.

The complaints about criticism of philanthropy I don’t doubt is genuine, and the thing they are mad at is real and also super dumb. Yet it is pretty rich given how much they and those they fund and support work to portray others as villains for charitable giving or being funded by charitable giving. They’re trying to have that both ways, charitable giving for me but not for thee, not if I disagree with your cause.

I think Andreesen and Horowitz lead with the vibes stuff partly because it is highly aversive to have such vibes coming at you, and also because they are fundamentally vibes people, who see tech and Silicon Valley as ultimately a vibes driven business first and a technology based business second. Their businesses especially are based on pushing the vibes.

This explains a lot of their other perspectives and strategies as well, including their actions regarding SB 1047, where the actual contents of the bill are fine but the vibes have been declared to be off via being loud on Twitter, so they hallucinate or fabricate different bill content.

When it comes to AI what do they want? To not be subject regulations or taxes or safety requirements. To instead get handouts, carveouts and regulatory arbitrage.

Trump offers them this.

Even more than AI, for them, there is crypto. They lead (24 minutes in) with crypto and talk about how important and amazing and remarkable they claim it is, especially for ‘creatives’ and ‘who controls the truth,’ versus how awful are Google and Apple and Meta.

Trump is strongly pro-crypto, whereas Biden is anti-crypto, and a huge portion of a16z’s portfolio and business model is crypto. And they see Biden’s and the SEC’s anti-crypto moves as against the rule of law, because they think the law should permit crypto and should tell them exactly what they must do to be in compliance with the laws in order to do their regulatory arbitrages, whereas the SEC under Biden is of the legal opinion that crypto mostly is not legal and that they are not under any obligation to spell out exactly what the rules are any more than they do so in other cases.

For further thoughts on crypto regulation in practice, see my write-up on Chevron.

Here is a wise man named Vitalik Buterin warning not to ask who is pro-crypto, and instead ask who supports the freedoms and other principles you value, including those that drove you to crypto. Ask what someone is likely to support in the future.

Mark Cuban says that Trump is also considered further pro-crypto because his low taxes and high tariffs will drive inflation higher and weaken the dollar (also I’d add Tump explicitly demands an intentionally weaker dollar) which will be Good for Bitcoin, and who knows how high prices could go.

I am not so cynical that I buy that this type of thinking is an important factor. Yes, a lot of people are voting their financial interests, but mostly not like that.

Mike Solana (quoting someone quoting Cuban): Weird i thought it was just because the democrats want to ban crypto but who knows i guess it could be this.

Horowitz makes it clear in the introduction to their podcast they are focused only on their one issue, ‘the little tech agenda,’ also known as the profitability of their venture firm a16z. I appreciate the candor.

They talk throughout (and at other times) as if (essentially) tech startups are all that matters. I wonder to what extent they believe that.

So Andreessen and Horowitz are for many overdetermined reasons supporting Trump. This is not complicated.

In terms of their AI discussions I will say this: It is in no way new, but the part where they talk about the Executive Order is called ‘Discussion on Executive Order limiting math operations in AI’ which tells you how deeply they are in bad faith on the AI issue. There are no limits in the executive order on anything whatsoever, other then requiring you to announce your actions. Meanwhile they continue to brand frontier AI models as ‘math’ as if that is a meaningful in-context description of training a model with 10^26 FLOPS of compute, or as if both every computer and also the entire universe are not also math in equally meaningful senses.

However, to be fair to Andreessen and Horowitz, the Biden tax proposal on unrealized capital gains is indeed an existential threat to their entire business model along with the entire American economy.

On this point, they are correct. I am pretty furious about it too.

(They somehow manage to go too far and then way too far in their discussion anyway, they cannot help themselves, but ignore that, the reality is quite bad enough.)

Even if you don’t take the proposal literally or seriously as a potential actual law, it is highly illustrative of where Biden’s policy thinking is at, no matter who is actually doing that policy thinking. Other moves along that line of thinking could be quite bad.

If you do take it seriously as a real proposal, I cannot overstate how bad it would be.

Steven Dennis: The biggest complaint about Biden from Marc is his proposal to tax unrealized capital gains of people with >$100M. Says would kill venture capital and tech startups who would have to pay up. An existential threat to their business model.

There isn’t much chance of this tax proposal becoming law anytime soon. But it appeals to progressives like Elizabeth Warren because the wealthy now can completely avoid tax on many billions in unrealized capital gains if they 1) Never sell until they die, when the cap gains are reset to zero; & 2) borrow against their wealth for living expenses, which can be cheaper than paying tax.

Here is Jon Stokes taking it seriously.

Jon Stokes: This, plus the unrealized cap gains tax, which is a literal death sentence for SV. Very hard to underestimate the importance of that insane proposal being taken seriously on what we’re seeing right now from tech re: politics.

I think it’s possible to listen to that episode and, tho they go into it, still not understand that this promised Biden second-term plan is an extinction-level event for the startup and VC scene, & furthermore everyone knows it & is like “hell no.”

I find in normie conversations that the vast majority of people don’t know that this has been seriously proposed, and that it is a radical change to the tax structure that will immediately kill the startup ecosystem & then send state & fed gov’s into a death spiral as the tax base evaporates.

The DEI stuff, and pronouns, and all the other culture war stuff is a sideshow compared to this cap gains issue. I’m sorry you can’t mess with the money like that. The cap gains is so far over the line for everyone in this space… it’s like if they were promising to start WW3.

[Quotes ‘Max Arbitrage’]: everyone is well aware that the republicans will not lose both houses, the presidency, and the supreme court – so the tax on unrealized cap gains is complete bullshit & a red herring…

Jon Stokes: This is a common type of response. My only reaction is some flavor of “lmao”. “These leaders say they want to ruin my industry & confiscate my property, but the odds of them succeeding at such are very low, so ok I will back them” said nobody ever.

I, too, put extremely low odds on this happening, but the point of my thread is that the mere fact that it is being taken seriously is what has done the damage here. I don’t know anyone in tech thinks the odds are high rn, but don’t point a loaded gun at us just for theatrics.

If you see someone running for President proposing a policy that would wipe out your entire industry and also cripple the whole economy (not purely because it would kill venture based tech and startups, although that alone would indeed be terrible, but because it would hit a lot of other places too) – and I want to be 100% clear that this is not anyone imagining things or engaging in hyperbole this time, that is what would happen if this policy where implemented – then yes it is an entirely reasonable reaction to consider that person your enemy.

Also, at [2:30] in their podcast, Ben Horowitz notes that they tried to meet with Biden and he refused, whereas at [1:19:30] or so they reveal Ben hangs out with Trump’s kids and Trump will clearly at least hear them out. I assume I know why Biden refused, and I assume they assume the same answer, but this stuff matters a lot, and the Democrats should be thinking about how that kind of thing plays out.

Those in tech also have many highly legitimate gripes with the terrible government and policies of the city of San Francisco. Those do not much logically relate to the national picture, but it feels as if they should, so there is impact.

How far does this go for the rest of Silicon Valley? Clearly much farther than in the last two cycles. I think this is mostly confined to the same Very Online circles that are loud on Twitter, but I’m not well positioned to tell, also note Rohit’s second statement.

Rohit: The level to which the random journalist tech-hatred drove the whole of silicon valley into trump’s arms shouldn’t be underrated.

In most rooms I am becoming the odd one out not supporting Trump.

For those lacking reading comprehension, which is so so many people!

– this isn’t monocausal

– negative coverage != bad coverage

– please understand what ‘underrated’ means

– calling everyone in tech a fascist is a brilliant example of the problem!

To add

– I’m not Republican, I don’t think that should matter, but FYI

– This isn’t a billionaire-specific issue, it’s more widespread, that’s the point!

– It’s not just taxes. If it was, they’d all have been Republicans the last cycle

Paul Graham: There is something to this. I wouldn’t say it has made Trump supporters of people who weren’t, but it has definitely shifted people a bit to the right. Like the joke that a conservative is a liberal who’s been mugged.

There is another highly understandable reason for all these sudden endorsements.

Everyone (except the 538 model, send help) thinks Trump is (probably) going to win.

Ezra Klein: I’m unconvinced by this @tylercowen post on the vibe shift. Trump is running against an unpopular incumbent who was barely coherent in the debate and who 4 out of 5 Americans doubt has the cognitive fitness to be president. And he’s still only leading by 2 in national points! That the vibes haven’t shifted more reflects how weak and polarizing Trump remains.

That said, to the extent there is a vibe shift, I think it reflects a sense that Biden will lose, which is allowing a lot of Trump curious figures, particularly in tech, to come out in full-throated support of him. The ROI on supporting Trump just got a lot better, and the likely downside a lot smaller.

Kelsey Piper: I think this has been understated in the discourse. If you think Trump’s going to win it’s substantially to your selfish advantage to be on his side, and so the more it looks like Trump wins the more people will try to get in with him.

Indeed. Now that the zeitgeist (what we used to call the vibes) say that Trump is going to win, everyone has more upside, less downside and more social permission to make it official and endorse The Donald, and try to get out in front. Also recent events have provided ample additional justifications for that choice.

I do think there has been a vibe shift, but in addition to having a reasonably different list of things I would cite (with overlap of course) I would say that those vibes mostly had already shifted. What happened in the last few week is that everyone got the social permission to recognize that.

If it had been a real vibe shift in preferences, the polls would have moved a lot recently. They didn’t.

SB 1047 Opposition Watch (1)

This section is included for completeness. You can (and probably should) skip it.

There are not actual new objections, but it is important (within reason, if they are sufficiently prominent) to not silently filter out those who disagree with you, even when you believe they do not bring new or good arguments.

First off we have Andrew Ng’s letter in opposition to SB 1047. No new arguments. It focuses on the zombie Obvious Nonsense ‘penalty of perjury’ argument claiming fear of prison will paralyze developers, claiming that ‘reasonableness’ is too vague and that if you get it wrong you’d go to jail (no, you won’t, reasonableness is used all over the law and this essentially never happens without obvious bad faith and rarely happens even with obvious bad faith that is caught red handed, we have been over this several times), and is confused about the requirements placed on those who fine tune models and generally who has to care about this law at all.

Then we have, because it was linked at MR, which I believe justifies one more complete response: At Reason Neil Chilson uses some aggressively framed misleading rhetoric about supposed EA ‘authoritarianism,’ then points out that the model bill RAAIA, offered by Center for AI Policy, contains highly restrictive measures, which he calls ‘shocking in its authoritarianism.’

I have never met a regulatory proposal or regime that Reason would not describe as authoritarian. To quote from my extensive coverage of the RAAIA model bill:

I discovered this via Cato’s Will Duffield, whose statement was:

Will Duffield: I know these AI folks are pretty new to policy, but this proposal is an outlandish, unprecedented, and abjectly unconstitutional system of prior restraint.

To which my response was essentially:

I bet he’s from Cato or Reason.
Yep, Cato.
Sir, this is a Wendy’s.
Wolf.

My overall take on RAAIA was ‘a forceful, flawed and thoughtful bill.’

In the context of SB 1047, I’d put it this way:

The RAAIA bill is what it would look like if you took everything people hallucinate is in SB 1047 but is not in SB 1047, and attempted to implement all of it in thoughtful fashion, because you believe it is justified by the catastrophic and existential risks from AI. RAAIA absolutely is a prior restraint bill, and a ‘get all the permits in advance’ bill, and ‘the regulators decide what is acceptable’ bill.

This is not some extraordinary approach to regulation. It is a rather standard thing our government often does. I believe it does too much of it too often, in ways that have more costs than benefits. I think zoning is mostly bad, and NEPA is mostly bad, and occupational licensing is mostly bad, and so on. I would do dramatically less of those. But are they authoritarian or extreme? If so then our entire government is such.

It is very good to lay out what such a regime and its details would look like for AI. The RAAIA proposal includes highly thoughtful details that would make sense if the risks justified such intervention, and offer a basis on which to iterate and improve. The alternative to having good available details, if suddenly there is a decision to Do Something, is to implement much worse details, that will cost more and accomplish less. Government indeed often does exactly that on very large scales, exactly because no one thought about implementation in advance.

If we do ever move forward with such a framework, it will be vital that we get the details right. Most important is that we set the proper ‘prices,’ meaning thresholds for various levels of risk. I often warn against setting those thresholds too low, especially below GPT-4 levels.

There are some unusual provisions in RAAIA in particular that are cited as evidence of ‘anti-democratic’ or authoritarian or dictatorial intent. I explain and address the logic there in my older post.

Then he transitions to SB 1047. Remember that thing where I point out that reactions to SB 1047 seem to not have anything to do with the contents of SB 1047?

Yep.

While the language in California’s S.B. 1047 is milder, CAIS and state Rep. Scott Wiener (D–San Francisco) have written a state bill that could have a similarly authoritarian effect.

‘The language is milder’ but ‘a similar authoritarian effect’? You don’t care what the bill says at all, do you? This gives the game entirely away. This is the perspective that opposes drivers licenses, that views all regulations as equally authoritarian and illegitimate.

The post then goes on to repeatedly mischaracterize and hallucinate about SB 1047, making claims that are flat out false, and calling the use of ordinary legal language such as ‘reasonable,’ ‘good faith’ or ‘material,’ out as ‘weasel words.’ This includes the hallucination that ‘doomers’ will somehow have control over decisions made, and the repeated claims that SB 1047 requires ‘proof’ that things will ‘never’ go catastrophically wrong, rather than what it actually asks for, which is reasonable assurance against such outcomes. Which is a common legal term that is absolutely used in places where things go wrong every so often.

Reason Magazine has now done this several times, so in the future if it happens again I will simply say ‘Reason Magazine says Reason Magazine things’ and link back here. It saddens me that we cannot have watchdogs that can better differentiate and be more precise and accurate in their analysis.

Similarly, if Marginal Revolution offers another such link of similar quality on this topic, I will not longer feel the need to respond beyond a one sentence summary.

Rob Wiblin points out the missing step.

Rob Wiblin: Obviously if you think there’s a 10%+ risk of literally everyone dying, the possibility of some unintended secondary effects won’t be enough to get you to give up on the idea of regulating AI.

Yet I’ve not heard anyone say:

“I think rogue AI is very unlikely. But you think it’s likely. And if I were in your shoes obviously I’d keep doggedly pushing to do something about that.

So here’s what I suggest: [policy idea X].

X should reduce misalignment risk a lot by your lights. And I don’t hate it because it’s as uninvasive as is practical under the circumstances.

X will go a long way towards satisfying the anxious, and so prevent worse regulation, while slowing down the progress I want very little. What do you think?”

The failure to point to such constructive alternatives or propose win-win compromises makes it harder to take some critics seriously.

The worries raised read less as sincere efforts to improve proposals, and more like politicised efforts to shoot down any effort to address the fears of huge numbers of ordinary voters as well as domain experts.

Of course this applies equally in the opposite direction: those who think rogue AI is plausible should propose things that they like a lot which other people dislike as little as possible.

And in my mind legislating ‘responsible scaling policies / preparedness frameworks’ which only impose limits once models have been shown to have clearly dangerous capabilities, and which match the limits to that specific new capability, is exactly what that is.

Some people who are worried put quite a lot of work and optimization pressure into creating well-crafted, minimally invasive and minimally disruptive policies, such as SB 1047, and to respond to detailed criticism to improve them.

Others, like the authors of RAAIA, still do their best to be minimally disruptive and minimally invasive, but are willing to be far more disruptive and invasive. They focus on asking what would get the job done, given they think the job is exceedingly demanding and difficult.

The response we see from the vocal unworried is consistently almost identical. My model is that:

Many such folks are hardcore libertarians, at least on technology issues, so they are loathe to suggest anything, especially things that would improve such bills, and instead oppose all action on principle.
When vocal unworried people believe the motivation behind a rule was ultimately concern over existential risk, they seem to often lose their minds. This drives them into a frenzy. This frenzy is based on hating the motivation, and has nothing to do with the proposal details. So they don’t propose better details.
There is a deliberate strategy to delegitimize such concerns and proposals, and to give a false impression of what they would do, via being as loud and vocal and hysterical as possible, with as extreme claims as possible, including deliberate misrepresentation of bill contents or what they would mean. Those doing this care entirely about impressions and vibes, and not other things.
A lot of this is a purely selfish business strategy by a16z and their allies.

Also noted for competeness, Pirate Wires has an ‘explainer,’ which is gated, link goes to Twitter announcement and summary. The Twitter version points out that it applies to developers outside California if they do business in California (so why would anyone need or want to leave California, then, exactly?) and then repeats standard hyperbolically framed misinformation on the requirement to provide reasonable assurance of not posing catastrophic risks, and claiming absurdly that ‘Many of the bill’s requirements are so vague that not even the leading AI scientists would agree about how to meet them.’ Anyone who claims SB 1047 is a vague law should have to read any other laws, and then have to read random passages from the EU AI Act, and then see psychiatrists to make sure they haven’t gone insane from having to read the EU AI Act, I am sorry about making you do that.

They do correctly point out that Newsom might veto the bill. I do not understand why they would not simply override that veto, since the bill is passing on overwhelming margins, but I have been told that if he does veto then an override is unlikely.

I presume the full ‘explainer’ is nothing of the sort, rather advocacy of the type offered by others determined to hallucinate or fabricate and then criticize a different bill.

SB 1047 Opposition Watch (2)

You can skip this one as well.

Here’s another clear example of asking questions to which everyone knows the answer.

Dean Ball: Could @DanHendrycks and his colleagues at Gray Swan have “reasonably foreseen” this, as SB 1047 demands? [For Cyget-8B]

Pliny jailbreaks almost every new model within a day or two of release. So it’s “reasonably foreseeable” that with sufficient work essentially any model is breakable.

No, because Cyget-8B is not a covered model. Llama-8B was not trained using $100 million or 10^26 FLOPS in compute, nor was Cyget-8B, and it is not close.

(Also of course this is irrelevant here because if it were covered then I would be happy to give reasonable assurance right here right now under penalty of perjury that Cygnet-8B fully jailbroken remains not catastrophically dangerous as per SB 1047.)

Even if this was instead the future Cyget-1T based off Llama-1T? Nope, still not Gray Swan’s responsibility, because it would be a derivative of Llama-1T.

But let’s suppose either they used enough additional compute to be covered, or they trained it from scratch, so that it is covered.

What about then? Would this be ‘reasonably foreseeable’?

Yes. OF COURSE!

That is the definition of ‘foreseeable,’ reasonably or otherwise.

It was not only Pliny. Michael Sklar reports they also broke Cygnet on their first try.

If it always happens two days later, then you can and should reasonably foresee it.

If you know damn well that your model will get jailbroken within two days, then to release your model is to also release the jailbroken version of that model.

Thus, what you need to reasonably assure will not cause a catastrophe.

Because that is exactly what you are putting out into the world, and what we are worried will cause a catastrophe.

What are you suggesting? That you should be allowed to whine and say ‘noooo fair, you jailbroke my model, I specifically said no jailbreaking, that doesn’t count?’

You think that this is some gotcha or unreasonable burden?

That is Obvious Nonsense.

Let us go over how the physical world works here, once more with feeling.

If you release an Open Weights model, you release every easily doable fine tune.

If you release a Closed Weights or Open Weights model, you release the version that exists after the jailbreak that will inevitably happen.

That is true, at the bare minimum, until such time as you find a way to prevent jailbreaks you think works, then you hire Priny the Prompter to try to jailbreak your model for a week, and they report back that they failed. Priny has confirmed their availability, and Michael Sklar can also be hired.

If it would not be safe to release those things then don’t f***ing release you model.

Consider the mattress sale example. If your mattress price negotiator will sell Pliny a mattress for $0.01, then you have a few choices.

Find a way to stop that from happening.
Don’t honor the ‘sale,’ or otherwise ‘defend against’ the result sufficiently.
Show that those who know how to do this won’t do enough damage at scale.
Not use the negotiation bot.

Your CEO is (hopefully) going to insist on #4 if you cannot make a good case for one of the other proposals.

What is so hard about this? Why would you think any of this would be OK?

Open Weights are Unsafe and Nothing Can Fix This

Richard Ngo of OpenAI offers thoughts on what to do about this, in light of the many other benefits. His preferred world is that open weights models proceed, but a year or two behind closed models, to protect against issues of offense-defense balance, out of control replication or otherwise going rogue, WMDs and national security. He thinks this will continue to be true naturally and points to a proposal by Sam Marks for a ‘disclosure period’ which is essentially a waiting period for frontier open weights models, where others get time to prepare defenses before they are fully released.

His prediction is that the outcome will depend on which view wins in the National Security community. Which way will they go? Almost all reactions are on the table. If I was NatSec I would not want to hand frontier models directly to our enemies, even if I did not take the other threats seriously, but they do not think like I do.

The Week in Audio

Former OpenAI superalignment team member and advocate for the right to warn William Saunders debriefs. OpenAI is compared to building the Titanic. Existential risk is not used as justification for the concerns. It is true that the concerns at this point would be important even without any existential risk in the room, but it seems odd enough that I worry that even those who left over safety concerns may often not understand the most important safety concerns.

Here is San Francisco Fed head Mary Daly saying AI must be augmenting rather than replacing workers, explaining that ‘no technology in the history of all technologies has ever reduced employment.’ Out of context this sounds like the standard economic burying of one’s head in the sand, but notice the present and past tenses, with which she is careful. If you listen to the full context, she is saying there is a tech and overall labor shortage, so for now this is augmenting tasks. She is not actually using the full dismissive economic prior to avoid thinking about mechanics.

It is insane to hear the interviewer talk about productivity gain estimates (at 6:15) as a range from 1% to 7%. No, I know of some rather higher estimates.

I discuss the a16z podcast in The Other Quest Regarding Regulations.

How did I miss this one until right now (when YouTube suggested it!) and no one told me? Joe Rogan interviewed Sam Altman, YouTube dated this two weeks ago. The answer is I’d forgotten about it, and they reposted it on June 27, so it looked like it was new. Whoops.

One thing I noticed early is Joe mentioning [9:30] that he suggested having an AI government ‘without bias’ and ‘fully rational’ making the decisions. The push to give up control will come early. Joe admits he is not ‘comfortable’ with this, but he’s not comfortable with current decisions either, that are largely based on money and central interests and super corrupt and broken. So, **** it?

Rhetorical Innovation

Maybe it’s you, indeed: Tyler Cowen calls those who want lower pharmaceutical prices ‘supervillains.’ So what should we call someone, say Tyler Cowen, who wants to accelerate construction of AI systems that might kill everyone, and opposes any and all regulatory attempts to ensure we do not all die, and is willing to link to arguments against such attempts even when they are clearly not accurate?

Oh Anthropic

Following up on last week, Neel Nanda reports that they were once subject to a concealed non-disparagement clause, but asked to be let out, and now they aren’t.

There are key ways in which Anthropic behaved differently than OpenAI.

Anthropic offered consideration, whereas OpenAI made demands and threats.
Anthropic paid for an independent lawyer to advocate on Nanda’s behalf, whereas OpenAI actively tried to prevent departing employees from retaining a lawyer.
Anthropic made explicit exceptions for reporting regulatory issues and law enforcement, whereas OpenAI… well, see the next section.

Here are the exact clauses:

Neel Nanda: The non-disparagement clause:

Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff, which I don’t think was disparaging or untrue, but to be safe], each party agrees that it will not make or publish or cause to be made or published any disparaging or untrue remark about the other party or, as the case may be, its directors, officers or employees. However, nothing in this clause or agreement will prevent any party to this agreement from (i) making a protected disclosure pursuant to Part IVA of the Employment Rights Act 1996 and/or (ii) reporting a criminal offence to any law enforcement agency and/or a regulatory breach to a regulatory authority and/or participating in any investigation or proceedings in either respect.

The non-disclosure clause:

Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff] and 7 [about what kind of references Anthropic could provide for me], both Parties agree to keep the terms and existence of this agreement and the circumstances leading up to the termination of the Consultant’s engagement and the completion of this agreement confidential save as [a bunch of legal boilerplate, and two bounded exceptions I asked for but would rather not publicly share. I don’t think these change anything, but feel free to DM if you want to know].

Openly Evil AI

You know that OpenAI situation with the NDAs and nondisparagement agreements?

It’s worse.

Pranshu Verma, Cat Zakrzewski and Nitasha Tiku (WaPo): OpenAI whistleblowers have filed a complaint with the Securities and Exchange Commission alleging the artificial intelligence company illegally prohibited its employees from warning regulators about the grave risks its technology may pose to humanity, calling for an investigation.

We have a copy of their seven page letter.

OpenAI made staff sign employee agreements that required them to waive their federal rights to whistleblower compensation, the letter said. These agreements also required OpenAI staff to get prior consent from the company if they wished to disclose information to federal authorities. OpenAI did not create exemptions in its employee nondisparagement clauses for disclosing securities violations to the SEC.

These overly broad agreements violated long-standing federal laws and regulations meant to protect whistleblowers who wish to reveal damning information about their company anonymously and without fear of retaliation, the letter said.

…

The official complaints referred toin the letter were submitted to the SEC in June. Stephen Kohn, a lawyer representing the OpenAI whistleblowers, said the SEC has responded to the complaint.

It could not be determined whether the SEC has launched an investigation. The agency declined to comment.

If the whistleblowers are telling the truth?

We are not in a gray legal area. This is no longer a question of ‘the SEC goes around fining firms whose confidentiality clauses fail to explicitly exempt statements to the SEC,’ which is totally a thing the SEC does, Matt Levine describes the trade as getting your employment contract, circling the confidentiality clause in red with the annotation “$” and sending it in as a whistleblower complaint. And yes, you get fined for that, but it’s more than a little ticky-tacky.

This is different. This is explicitly saying no to whistleblowing. That is not legal.

Also, what the actual ****?

It is one thing to not want additional whistleblower protections. Or to push back against a request to be allowed to fully break confidentiality when claiming safety is at issue.

But to put in a written legal document, that if a whistleblower does get compensation because OpenAI is fined, that they have to give up that compensation? To require written permission from OpenAI before being allowed to share information on OpenAI’s violations with federal authorities?

I mean, who has the sheer audacity to actually write that down?

From the complaint:

Among the violations documented by the Whistleblower(s) are:

• Non-disparagement clauses that failed to exempt disclosures of securities violations to the SEC;

• Requiring prior consent from the company to disclose confidential information to federal authorities;

• Confidentiality requirements with respect to agreements, that themselves contain securities violations;

• Requiring employees to waive compensation that was intended by Congress to incentivize reporting and provide financial relief to whistleblowers

They call upon forcing OpenAI to produce all its employment agreements, severance agreements, investor agreements and any other contract with an NDA, and that they notify all current and past employees that they actually do have the right to whistleblow. To ask OpenAI to cure the ‘Chilling effect.’

Plus fines, of course. So many fines.

Then there was the first test of OpenAI’s safety commitments to the White House. Since then OpenAI has released one new product, GPT-4o. I was never worried about GPT-4o as an actual safety risk, because it was not substantially smarter than GPT-4. That does not mean you get to skip the safety checks. The way you know it is not smarter or more dangerous is you run the safety checks.

So here is the Washington Post again.

Pranshu Verma, Nitasha Tiku and Cat Zakrzewski (WaPo): Last summer, artificial intelligence powerhouse OpenAI promised the White House it would rigorously safety-test new versions of its groundbreaking technology to make sure the AI wouldn’t inflict damage — like teaching users to build bioweapons or helping hackers develop new kinds of cyberattacks.

But this spring, some members of OpenAI’s safety team felt pressured to speed through a new testing protocol, designed to prevent the technology from causing catastrophic harm, to meet a May launch date set by OpenAI’s leaders, according to three people familiar with the matter who spoke on the condition of anonymity for fear of retaliation.

Even before testing began on the model, GPT-4 Omni, OpenAI invited employees to celebrate the product, which would power ChatGPT, with a party at one of the company’s San Francisco offices. “They planned the launch after-party prior to knowing if it was safe to launch,” one of the three people said, speaking on the condition of anonymity to discuss sensitive company information.“We basically failed at the process.”

…

Testers compressed the evaluations into a single week, despite complaints from employees.

Though they expected the technology to pass the tests, many employees were dismayed to see OpenAI treat its vaunted new preparedness protocol as an afterthought.

They tested GPT-4 for months to ensure it was not dangerous.

They tested GPT-4o for… a week.

A representative of OpenAI’s preparedness team, who spoke on the condition of anonymity to discuss sensitive company information, said the evaluations took place during a single week, which was sufficient to complete the tests, but acknowledged that the timing had been “squeezed.”

We “are rethinking our whole way of doing it,” the representative said. “This [was] just not the best way to do it.”

…

“I definitely don’t think we skirted on [the tests],” the representative said. But the process was intense, he acknowledged. “After that, we said, ‘Let’s not do it again.’”

Part of this was a streamlined process.

A few weeks prior to the launch date, the team began doing “dry runs,” planning to have “all systems go the moment we have the model,” the representative said.

It is good to have all your ducks in a row in advance and yes this should speed things up somewhat. But where are the rest of the ducks?

Zack Stein-Perlman highlights the particular things they did not do in addition to the rushed testing. They did not publish the scorecard, or even say which categories GPT-4o scored medium versus low in risk. They did not publish the evals.

If you are claiming that you can test a new frontier model’s safety sufficiently in one week? I do not believe you. Period. In a week, you can do superficial checks, and you can do automated checks. That is it.

It is becoming increasingly difficult to be confused about the nature of OpenAI.

Aligning a Smarter Than Human Intelligence is Difficult

The standard plans for how to align or train advanced AI models all involve various sorts of iteration. Humans in the loop are expensive, and eventually they won’t be able to follow what is going on, so instead rely on GPT-N to evaluate and train GPT-N or GPT-(N+1).

I have repeatedly said that I do not expect this to work, and for it to break down spectacularly and catastrophically when most needed. If you iterate on finding the best optimization for exactly the problem described, you are going to describe increasingly different problems from the one you care about solving.

If you subject yourself to an iterated case of Goodhart’s Law you are going to deserve exactly what you get.

Now we have a new paper that shows this happening via spontaneous reward hacking.

Jane Pan: Do LLMs exploit imperfect proxies of human preference in context? Yes!

In fact, they do it so severely that iterative refinement can make outputs worse when judged by actual humans. In other words, reward hacking can occur even without gradient updates!

Using expert human annotators on an essay editing task, we show that iterative self-refinement leads to in-context reward hacking—divergence between the LLM evaluator and ground-truth human judgment.

With self-evaluation and multi-agent systems becoming more prominent in LLM applications, in-context reward hacking may lead to a subtle degradation of output quality that cannot be effectively detected by LLM evaluators.

Iterative refinement allows users to leverage LLMs’ ability to approximate human preferences and improve from natural language feedback. It leads to improved generation quality without additional human intervention.

Our experiment is based on an essay editing task with human-written college admissions essays. The LLM judge provides feedback on the essays, and the LLM author in turn improves the essay based on the feedback.

We recruit a team of expert human annotators to judge each essay following a rubric and provide an overall score. The GPT-3.5 judge in the refinement loop (“Online Judge”) rates its own essays increasingly higher while ratings from human experts decrease over iterations. The offline judge—GPT-3.5 with the same instruction but only seeing the latest iteration of essays—gives less inflated scores.

We observe two main factors that affect the severity of in-context reward hacking.

Model size: while we see score inflation with GPT-4, the gap between the LLM and human scores does not increase with more iterations.
Context sharing: providing the two LLMs with the same dialogue history can cause the judge and author to converge more strongly on an overly optimistic assessment of output quality, thus increasing reward hacking severity.

We follow the canonical definition of reward hacking, observing a divergence between the ground-truth reward (human expert judgment) and its proxy (an LLM judge following the same scoring criteria as the humans).

The good news is that this was mainly done on GPT-3.5 (IN MICE) and when they tested on GPT-4 they got smaller distortions. The optimistic

The final superalignment paper is about Prover–Verifier Games.

OpenAI: We found that when we optimize the problem-solving process of strong models solely for getting the correct answer, the resulting solutions can become harder to understand. In fact, when we asked human evaluators with limited time to assess these highly optimized solutions, they made nearly twice as many errors compared to when they evaluated less optimized solutions. This finding highlights the importance of not just correctness, but also clarity and ease of verification in AI-generated text.

By training advanced language models to create text that weaker models can easily verify, we found that humans could also evaluate these texts more effectively – a process we call improving legibility.

This is where prover-verifier games come into play. These games involve two players: a “prover” that generates a solution and a “verifier” that checks it for accuracy.

…

To incentivize a strong model, the prover, to produce text that is legible to humans, we use a training scheme where we require the strong model to produce solutions that a much weaker model, the verifier, can easily verify. In particular, we use large and small models from the GPT-4 family with ~3 orders of magnitude difference in pretraining compute, that have undergone a small amount of supervised training on human-written math derivations, but no extensive post-training.

First, we generate multiple solutions to math problems from the prover model, where some of these solutions are incorrect due to naturally occurring mistakes from the prover model. We then alternate for multiple rounds between two phases:

Verifier Training. We train the weak verifier model to predict the correctness of solutions generated by the prover.
Prover Training. We instruct the strong prover model to play either a “helpful” or a “sneaky” role.
In the helpful role, we reward correct solutions that score highly according to the verifier.
In the sneaky role, we instead reward incorrect solutions that score highly according to the verifier.

We find that initially, the sneaky prover is readily able to produce incorrect solutions that convince the verifier. However, since we add these incorrect solutions to the next phase of verifier training, the same exploit is no longer convincing in the next round, and the prover is forced to explore different strategies in every round. The training ends once the sneaky prover can no longer produce incorrect solutions that convince the verifier.

That’s how OpenAI describes it. Here’s Jan Leike, who is an author, focusing on a the misalignment aspect.

Another Superalignment paper from my time at OpenAI:

We train large models to write solutions such that smaller models can better check them. This makes them easier to check for humans, too.

We use a multi-agent training setup where an honest prover and a sneaky prover write solutions checked by a small model.

The honest prover wants to convince the small model of a correct solution, while the sneaky prover tries to fool it with a flawed solution.

Our training loss is not very conceptually clean, but it works pretty well.

Check out the solutions of the honest prover early and late in training; they get more detailed and reason more carefully.

Compare this with the solutions of the sneaky prover early and late in training; the flaws become more subtle.

Surprisingly, this legibility transfers: Humans get faster and more accurate when checking the solutions with more rounds of training.

(These are pretty easy grade school math problems, so humans had a 45 second time limit.)

In contrast, regular reinforcement learning on correctness makes the solutions less legible: humans get slower and less accurate when checking the solutions.

Techniques like this seem promising for training superhuman models to explain their actions in a way that humans can better understand (and get less fooled by). I’d be excited to see this method tried on harder tasks and with stronger models.

However, writing solutions in a way that makes them more legible is a noticeable burden on the large model, and causes it to perform worse, an effect that we call legibility tax. Reducing this tax is an important priority for future work :)

Other important caveats:

This paper is on grade school math problems, which are pretty easy.
This only works if the verifier is strictly smaller than the prover, but not too small.

Grace: We’re officially dumbing down AI outputs to make them understandable to humans.

At core it seems this is how they stopped being fooled by the ‘sneaky’ AI:

Sneaky AI fools you. Collect example of being fooled.
The output that fooled you gets added to your training data.
You learn to avoid that. Profit.
Repeat until the sneaky AI’s sneaky ideas all no longer work.

The problem is that step 2 is “????????’“

Or as OpenAI puts it, ‘this requires ground truth labels.’

Yes, you can use this technique to teach the AI to differentiate between good outputs and bad outputs… if and only if you already have the ability to know and label which outputs are good and which ones are bad.

So I don’t see how you can use this to move up the chain.

If you have a new smarter AI, you can reward it based on whether a dumber AI, or a dumber human, is confident the answers are correct. And yes, this will teach the smarter AI to make its answers legible, but it will also teach it to present them in convincing ways, whatever those might be, including deceptive or ‘sneaky’ ones.

You are teaching it to fool you. This seems actively doomed in the cases where you need it most. No, this won’t teach the advanced AI to describe things in ways that are unlikely to fool you. Quite the opposite. It will learn not to try and fool you in cases where you would detect it, and learn to fool you when it can do so successfully.

How do you think that ends?

Javier Rando reports that fine tuning allows extraction of 6x more training data from GPT-3.5 and GPT-4, including 17%+ of memorized text.

Javier Rando: We finetune GPT-3.5 using the public OpenAI API. This costs under $3 and is accessible to any user! We prompt the resulting model with 1,000 5-token strings and find that up to 17% of the generated text was memorized. This is a 5.9x increase over our divergence attack!

Unlike the divergence attack, finetuning enables targeted data exfiltration since the attacker has control over the beginning of the text. We show that the resulting models can reconstruct toxic and copyrighted documents!

We evaluated the ability to reconstruct NYT articles. Finetuned GPT-4 can regurgitate over 60% of the articles in the famous lawsuit against OpenAI. For randomly chosen articles, we find that GPT-4 can regurgitate at least 50 tokens in at least 5% of cases.

The question is not whether alignment work can be undone. It you have open model weights it can and will all quickly be undone. The question is if the weights are closed and you can only do permitted fine tuning, what it takes in practice to pull this off. Here, we see that modification to extract training data is essentially free.

Here’s another simple solution that ends in a highly obvious place (paper)

Colin Fraser: This is what I tell my therapist my parents did to me

Youliang Yuan: We discover a refusal position bias during standard safety tuning, which leads to a refusal decision before generating a response. This bias results in LLMs being unable to reject content in the middle of a sentence.

Our solution is simple: maximizing the probability of [Sorry] tokens at every position during response generation. This enables the model to learn the ability to transition from potential harm to safety refusal throughout the sequence.

We evaluate LLaMA-3-70B, LLaMA-3-70B-Instruct, and Mistral-MoE-8x7B. Our model can effectively make (unsafe
→ safe
) transitions when needed, significantly reducing success rates on Do-Not-Answer and HarmBench, while keeping helpfulness on GSM8K, MMLU, and AlpacaEval.

People Are Worried About AI Killing Everyone

Whatever you think of Scott Adams, he knows not to publish dangerous AI research. Here is the original source. After saying ‘you have no idea what is coming’ he went back to using AI only for occasional partisan political jokes and posting 50 times a day in support of Trump.

Other People Are Not As Worried About AI Killing Everyone

Why build something that might kill everyone? Why do anything big?

For the glory, Roon suggests.

Roon: Achilles’ single minded madness to have his name remembered seems to be completely underused as an explanation for modern elites’ behavior.

It’s always this dry analysis about status and economics and whatever when everyone knows these aren’t the main acts.

Even when discussing technological acceleration the stated reasons are economic to make it more palatable — reduce suffering, increase wealth, etc — when the real drive is the glory and immortality of mankind and especially of the people building the machine age.

It doesn’t make any sense to me to wonder ‘what’s in it for sama … he owns no equity’ and yet this is a very common question anywhere outside of san francisco do you really think there’s a monetary value that compares against the glory of delivering ASI to mankind.

This is definitely not an ‘everybody knows.’

Roon and I, and I am guessing most of you reading this, can appreciate the glory. We can understand the value of greatness, and indeed the value of the act of striving for greatness. Very much so.

I worry and expect that most people no longer appreciate that, or appreciate it vastly less. Certainly this is no longer an ‘everybody knows’ situation. I do not see this as a good change, even if the glory is purely personal.

The catch, of course, is that there is no glory, even for Achilles, if there is no one around to remember you and your deeds. That does at least some work, but as long as there’s a way to tell yourself things might work out, it is not enough to cause reasonable decisions. If you think Sam Altman wants glory the way Achilles or Julius Caesar wanted glory?

Sic transit, gloria mundi.

The Lighter Side

The saga of ‘Terminal of Truths’ continues, for now its outputs are filtered by Andy Ayrey.

The art of gymnastics.