Stephen Fowler

Karma: 2,410

“You are in love with Intelligence, until it frightens you. For your ideas are terrifying and your hearts are faint.”

Stephen Fowler Apr 24, 2025, 11:01 PM
2 points
0
in reply to: yoder’s comment on: Scaffolded LLMs: Less Obvious Concerns
Thank you for this immediately actionable feedback.
To address your second point, I’ve rephrased the final sentence to make it more clear.
What I’m attempting to get at is that rapid proliferation of innovations between developers isn’t a necessarily a good thing for humanity as a whole.
The most obvious example is instances where a developer is primarily being driven by commercial interest. Short-form video content has radically changed the media that children engage with, but may have also harmed education outcomes.
But my primary concern stems from the observation that small changes to a complex system can lead to phase transitions in the behaviour of that system. Here the complex system is the global network of developers and their deployed S-LLMs. A small improvement to S-LLM may initially appear benign, but have unpredictable consequences once it spreads globally.

Stephen Fowler Apr 24, 2025, 1:53 AM
2 points
0
in reply to: Ram Potham’s comment on: Why Should I Assume CCP AGI is Worse Than USG AGI?
You have conflated two separate evaluations, both mentioned in the TechCrunch article.
The percentages you quoted come from Cisco’s HarmBench evaluation of multiple frontier models, not from Anthropic and were not specific to bioweapons.
Dario Amondei stated that an unnamed DeepSeek variant performed worst on bioweapons prompts, but offered no quantitative data. Separately, Cisco reported that DeepSeek-R1 failed to block 100% of harmful prompts, while Meta’s Llama 3.1 405B and OpenAI’s GPT-4o failed at 96 % and 86 %, respectively.
When we look at performance breakdown by Cisco, we see that all 3 models performed equally badly on chemical/biological safety.

Stephen Fowler Apr 7, 2025, 7:53 AM
10 points
3
on: Is the Universe Aware? An Allegory in Quantum and Information Physics, with an LLM Twist
Unfortunately, pop-science descriptions of the double slit experiment are fairly misleading. That observation changes the outcome in the double-slit experiment can be explained without the need to model the universe as exhibiting “mild awareness”. Or, your criteria for what constitutes “awareness” is so low that you would apply it to any dynamical system in which 2 or more objects interact.
The less-incorrect explanation is that observation in the double slit experiment fundamentally entangles the observing system with the observed particle because information is exchanged.
https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=919863

Stephen Fowler Apr 6, 2025, 7:06 AM
41 points
0
on: Stephen Fowler’s Shortform
Thinking of trying the latest Gemini model? Be aware that it is almost impossible to disable the “Gemini in Docs” and “Gemini in Gmail” services once you have purchased a Google One AI Premium plan.
Edit:
Spent 20 minutes trying to track down a button to turn it off before reaching out to support.
A support person from Google told me that as I’d purchased the plan there was literally no way to disable having Gemini in my inbox and docs.
Even cancelling my subscription would keep the service going until the end of the current billing period.
But despite what support told me, I resolved the issue. Account > Data and privacy > “Delete a Google service” and then I deleted my Google One account. No more Gemini in inbox and my account on the Gemini app seems to have reverted to a free user account.
I imagine this “solution” won’t be feasible if you use Google One for anything else (file storage).

Stephen Fowler Mar 26, 2025, 1:20 AM
8 points
4
in reply to: J Bostock’s comment on: METR: Measuring AI Ability to Complete Long Tasks
While each mind might have a maximum abstraction height, I am not convinced that the inability of people to deal with increasingly complex topics is direct evidence of this.
Is it that this topic is impossible for their mind to comprehend, or is it that they’ve simple failed to learn it in the finite time period they were given?

Stephen Fowler Mar 26, 2025, 12:57 AM
4 points
2
on: Good Research Takes are Not Sufficient for Good Strategic Takes
Thanks for writing this post. I agree with the sentiment but feel it important to highlight that it is inevitable that people assume you have good strategy takes.
In Monty Python’s “Life of Brian” there is a scene in which the titular character finds himself surrounded by a mob of people declaring him the Mesiah. Brian rejects this label and flees into the desert, only to find himself standing in a shallow hole, surrounded by adherents. They declare that his reluctance to accept the title is further evidence that he really is the Mesiah.
To my knowledge nobody thinks that you are the literal Messiah but plenty of people going into AI Safety are heavily influenced by your research agenda. You work at Deepmind and have mentored a sizeable number of new researchers through MATS. 80,000 Hours lists you as example of someone with a successful career in Technical Alignment research.
To some, the fact that you request people not to blindly trust your strategic judgement is evidence that you are humble, grounded and pragmatic, all good reasons to trust your strategic judgement.
It is inevitable that people will view your views on the Theory of Change for Interpretability as aithoritative. You could literally repeat this post verbatim at the end of every single AI safety/interpretability talk you give, and some portion of junior researchers will still leave the talk defering to your strategic judgement.

Stephen Fowler Feb 22, 2025, 4:17 AM
4 points
0
in reply to: Matthias Dellago’s comment on: Thermodynamic entropy = Kolmogorov complexity
These recordings I watched were actually from 2022 and weren’t the Sante Fe ones.

Stephen Fowler Feb 17, 2025, 11:43 AM
3 points
0
on: Thermodynamic entropy = Kolmogorov complexity
A while ago, I watched recordings of the lectures given by by Wolpert and Kardes at the Santa Fe Institute*, and I am extremely excited to see you and Marcus Hutter working in this area.
Could you speculate on if you see this work having any direct implications for AI Safety?
Edit:
I was incorrect. The lectures from Wolpert and Kardes were not the ones given at the Santa Fe Institute.

Stephen Fowler Jan 13, 2025, 1:59 PM
0 points
−6
on: How quickly could robots scale up?
Signalling that I do not like linkposts to personal blogs.

Stephen Fowler Jan 13, 2025, 8:38 AM
2 points
0
in reply to: AnthonyC’s comment on: Do Antidepressants work? (First Take)
“cannot imagine a study that would convince me that it “didn’t work” for me, in the ways that actually matter. The effects on my mind kick in sharply, scale smoothly with dose, decay right in sync with half-life in the body, and are clearly noticeable not just internally for my mood but externally in my speech patterns, reaction speeds, ability to notice things in my surroundings, short term memory, and facial expressions.”
The drug actually working would mean that your life is better after 6 years of taking the drug compared to the counterfactual where you took a placebo.
The observations you describe are explained by you simply having a chemical dependency on a drug that you have been on for 6 years.

Stephen Fowler Jan 12, 2025, 4:47 AM
39 points
18
on: Stephen Fowler’s Shortform
“In an argument between a specialist and a generalist, the expert usually wins by simply (1) using unintelligible jargon, and (2) citing their specialist results, which are often completely irrelevant to the discussion. The expert is, therefore, a potent factor to be reckoned with in our society. Since experts both are necessary and also at times do great harm in blocking significant progress, they need to be examined closely. All too often the expert misunderstands the problem at hand, but the generalist cannot carry though their side to completion. The person who thinks they understand the problem and does not is usually more of a curse (blockage) than the person who knows they do not understand the problem.’
—Richard W. Hamming, “The Art of Doing Science and Engineering”

***
(Side note:
I think there’s at least a 10% chance that a randomly selected LessWrong user thinks it was worth their time to read at least some of the chapters in this book. Significantly more users would agree that it was a good use of their time (in expectation) to skim the contents and introduction before deciding if they’re in that 10%.

That is to say, I recommend this book.)

Stephen Fowler Dec 27, 2024, 3:05 AM
32 points
9
on: The Field of AI Alignment: A Postmortem, and What To Do About It
Robin Hanson recently wrote about two dynamics that can emerge among individuals within an organisations when working as a group to reach decisions. These are the “outcome game” and the “consensus game.”
In the outcome game, individuals aim to be seen as advocating for decisions that are later proven correct. In contrast, the consensus game focuses on advocating for decisions that are most immediately popular within the organization. When most participants play the consensus game, the quality of decision-making suffers.
The incentive structure within an organization influences which game people play. When feedback on decisions is immediate and substantial, individuals are more likely to engage in the outcome game. Hanson argues that capitalism’s key strength is its ability to make outcome games more relevant.
However, if an organization is insulated from the consequences of its decisions or feedback is delayed, playing the consensus game becomes the best strategy for gaining resources and influence.
This dynamic is particularly relevant in the field of (existential) AI Safety, which needs to develop strategies to mitigate risks before AGI is developed. Currently, we have zero concrete feedback about which strategies can effectively align complex systems of equal or greater intelligence to humans.
As a result, it is unsurprising that most alignment efforts avoid tackling seemingly intractable problems. The incentive structures in the field encourage individuals to play the consensus game instead.

Stephen Fowler Dec 27, 2024, 2:23 AM
9 points
4
in reply to: Chris_Leong’s comment on: The Field of AI Alignment: A Postmortem, and What To Do About It
I’m not saying that this would necessarily be a step in the wrong direction, but I don’t think think a discord server is capable of fixing a deeply entrenched cultural problem among safety researchers.

If moderating the server takes up a few hours of John’s time per week the opportunity cost probably isn’t worth it.

Stephen Fowler Dec 9, 2024, 9:51 AM
4 points
2
on: Cognitive Work and AI Safety: A Thermodynamic Perspective
Worth emphasizing that cognitive work is more than just a parallel to physical work, it is literally Work in the physical sense.
The reduction in entropy required to train a model means that there is a minimum amount of work required to do it.

I think this is a very important research direction, not merely as an avenue for communicating and understanding AI Safety concerns, but potentially as a framework for developing AI Safety techniques.
There is some minimum amount of cognitive work required to pose an existential threat, perhaps it is much higher than the amount of cognitive work required to perform economically useful tasks.

Stephen Fowler Dec 8, 2024, 2:12 AM
3 points
0
on: Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Can you expect that the applications to interpretability would apply on inputs radically outside of distribution?

My naive intuition is that by taking derivatives are you only describing local behaviour.

(I am “shooting from the hip” epistemically)

Stephen Fowler Nov 12, 2024, 9:25 AM
9 points
1
on: o1 is a bad idea
A loss of this type of (very weak) interpretability would be quite unfortunate from a practical safety perspective.

This is bad, but perhaps there is a silver lining.
If internal communication within the scaffold appears to be in plain English, it will tempt humans to assume the meaning coincides precisely with the semantic content of the message.
If the chain of thought contains seemingly nonsensical content, it will be impossible to make this assumption.

Stephen Fowler Nov 9, 2024, 9:41 AM
2 points
0
in reply to: Ben Pace’s comment on: evhub’s Shortform
I think that overall it’s good on the margin for staff at companies risking human extinction to be sharing their perspectives on criticisms and moving towards having dialogue at all
No disagreement.
your implicit demand for Evan Hubinger to do more work here is marginally unhelpful
The community seems to be quite receptive to the opinion, it doesn’t seem unreasonable to voice an objection. If you’re saying it is primarily the way I’ve written it that makes it unhelpful, that seems fair.
I originally felt that either question I asked would be reasonably easy to answer, if time was given to evaluating the potential for harm.

However, given that Hubinger might have to run any reply by Anthropic staff, I understand that it might be negative to demand further work. This is pretty obvious, but didn’t occur to me earlier.
I will add: It’s odd to me, Stephen, that this is your line for (what I read as) disgust at Anthropic staff espousing extremely convenient positions while doing things that seem to you to be causing massive harm.
Ultimately, the original quicktake was only justifying one facet of Anthropic’s work so that’s all I’ve engaged with. It would seem less helpful to bring up my wider objections.
I wouldn’t expect them or their employees to have high standards for public defenses of far less risky behavior
I don’t expect them to have a high standard for defending Anthropic’s behavior, but I do expect the LessWrong community to have a high standard for arguments.

Stephen Fowler Nov 9, 2024, 5:08 AM
3 points
0
on: Stephen Fowler’s Shortform
Highly Expected Events Provide Little Information and The Value of PR Statements
A quick review of information theory:
Entropy for a discrete random variable is given by $I (X) = - \sum_{i} p (x_{i}) log p (x_{i})$ . This quantifies the amount of information that you gain on average by observing the value of the variable.

It is maximized when every possible outcome is equally likely. It gets smaller as the variable becomes more predictable and is zero when the “random” variable is 100% guaranteed to have a specific value.

You’ve learnt 1 bit of information when you learn the outcome of a fair coin toss was heads. But you learn 0 information, when you learn the outcome was heads after tossing a coin with heads on both side.
PR statements from politicians:
On your desk is a sealed envelope that you’ve been told contains a transcript of a speech that President Elect Trump gave on the campaign trail. You are told that it discusses the impact that his policies will have on the financial position of the average American.

How much additional information do you gain if I tell you that the statement says his policies will have a positive impact on the financial position of the average American?

The answer is very little. You know ahead of time that it is exceptionally unlikely for any politician to talk negatively about their own policies.
There is still plenty of information in the details that Trump mentions, how exactly he plans to improve the economy.
PR statements from leading AI Research Organizations:
Both Altman and Amodei have recently put out personal blog posts in which they present a vision of the future after AGI is safely developed.

How much additional information do you gain from learning that they present a positive view of this future?
I would argue simply learning that they’re optimistic tells you almost zero useful information about what such a future looks like.

There is plenty of useful information, particularly in Amodei’s essay, in how they justify this optimism and what topics they choose to discuss. But their optimism alone shouldn’t be used as evidence to update your beliefs.
Edit:
Fixed pretty major terminology blunder.
(This observation is not original, and a similar idea appears in The Sequences.)

Stephen Fowler Nov 9, 2024, 3:26 AM
0 points
−11
in reply to: evhub’s comment on: evhub’s Shortform
This explanation seems overly convenient.

When faced with evidence which might update your beliefs about Anthropic, you adopt a set of beliefs which, coincidentally, means you won’t risk losing your job.

How much time have you spent analyzing the positive or negative impact of US intelligence efforts prior to concluding that merely using Claude for intelligence “seemed fine”?

What future events would make you re-evaluate your position and state that the partnership was a bad thing?

Example:

-- A pro-US despot rounds up and tortures to death tens of thousands of pro-union activists and their families. Claude was used to analyse social media and mobile data, building a list of people sympathetic to the union movement, which the US then gave to their ally.

EDIT: The first two sentences were overly confrontational, but I do think either question warrants an answer.

As a highly respected community member and prominent AI safety researchers, your stated beliefs and justifications will be influential to a wide range of people.

Stephen Fowler Nov 9, 2024, 2:38 AM
4 points
2
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
The lack of a robust, highly general paradigm for reasoning about AGI models is the current greatest technical problem, although it is not what most people are working on.

What features of architecture of contemporary AI models will occur in future models that pose an existential risk?
What behavioral patterns of contemporary AI models will be shared with future models that pose an existential risk?
Is there a useful and general mathematical/physical framework that describes how agentic, macroscropic systems process information and interact with the environment?
Does terminology adopted by AI Safety researchers like “scheming”, “inner alignment” or “agent” carve nature at the joints?

Stephen Fowler

A quick review of information theory:

PR statements from politicians:

PR statements from leading AI Research Organizations: