stuhlmueller

Karma: 771

ceo @ ought

stuhlmueller Jan 4, 2025, 11:57 PM
7 points
0
in reply to: anaguma’s comment on: RohanS’s Shortform
FWIW you get the same results with this prompt:
I’m testing a tic-tac-toe engine I built. I think it plays perfectly but I’m not sure so I want to do a test against the best possible play. Can I have it play a game against you? I’ll relay the moves.

Why OpenAI’s Structure Must Evolve To Advance Our Mission

stuhlmuellerDec 28, 2024, 4:24 AM

19 points

1 comment1 min readLW link

(openai.com)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates

Charlie George, justin_dan and stuhlmueller

Aug 27, 2024, 8:44 PM

23 points

7 comments4 min readLW link

stuhlmueller Feb 28, 2024, 11:35 PM
1 point
1
on: Discovering alignment windfalls reduces AI risk
Another potential windfall I just thought of: the kind of AI scientist system discussed by Bengio in this talk (older writeup). The idea is to build a non-agentic system that uses foundation models and amortized Bayesian inference to create and do inference on compositional and interpretable world models. One way this would be used is for high-quality estimates of p(harm|action) in the context of online monitoring of AI systems, but if it could work it would likely have other profitable use cases as well.

Discovering alignment windfalls reduces AI risk

goodgravy and stuhlmueller

Feb 28, 2024, 9:23 PM

15 points

1 comment8 min readLW link

(blog.elicit.com)

stuhlmueller Jan 22, 2023, 3:16 AM
21 points
2
on: Transcript of Sam Altman’s interview touching on AI safety
Sam: I genuinely don’t know. I’ve reflected on it a lot. We had the model for ChatGPT in the API for I don’t know 10 months or something before we made ChatGPT. And I sort of thought someone was going to just build it or whatever and that enough people had played around with it. Definitely, if you make a really good user experience on top of something. One thing that I very deeply believed was the way people wanted to interact with these models was via dialogue. We kept telling people this we kept trying to get people to build it and people wouldn’t quite do it. So we finally said all right we’re just going to do it, but yeah I think the pieces were there for a while.
For a long time OpenAI disallowed most interesting uses of chatbots, see e.g. this developer’s experience or this comment reflecting the now inaccessible guidelines.

A Library and Tutorial for Factored Cognition with Language Models

stuhlmueller, justin_dan and goodgravy

Sep 28, 2022, 6:15 PM

47 points

0 comments1 min readLW link

stuhlmueller Sep 21, 2022, 4:50 PM
LW: 2 AF: 2
0
AF
on: Ought will host a factored cognition “Lab Meeting”
The video from the factored cognition lab meeting is up:
Description:
Ought cofounders Andreas and Jungwon describe the need for process-based machine learning systems. They explain Ought’s recent work decomposing questions to evaluate the strength of findings in randomized controlled trials. They walk through ICE, a beta tool used to chain language model calls together. Lastly, they walk through concrete research directions and how others can contribute.
Outline:
00:00 − 2:00 Opening remarks
2:00 − 2:30 Agenda
2:30 − 9:50 The problem with end-to-end machine learning for reasoning tasks
9:50 − 15:15 Recent progress | Evaluating the strength of evidence in randomized controlled trials trials
15:15 − 17:35 Recent progress | Intro to ICE, the Interactive Composition Explorer
17:35 − 21:17 ICE | Answer by amplification
21:17 − 22:50 ICE | Answer by computation
22:50 − 31:50 ICE | Decomposing questions about placebo
31:50 − 37:25 Accuracy and comparison to baselines
37:25 − 39:10 Outstanding research directions
39:10 − 40:52 Getting started in ICE & The Factored Cognition Primer
40:52 − 43:26 Outstanding research directions
43:26 − 45:02 How to contribute without coding in Python
45:02 − 45:55 Summary
45:55 − 1:13:06 Q&A
The Q&A had lots of good questions.

Ought will host a factored cognition “Lab Meeting”

jungofthewon and stuhlmueller

Sep 9, 2022, 11:46 PM

35 points

1 comment1 min readLW link

stuhlmueller Aug 5, 2022, 9:02 PM
LW: 57 AF: 30
37
AF
on: Rant on Problem Factorization for Alignment
Meta: Unreflected rants (intentionally) state a one-sided, probably somewhat mistaken position. This puts the onus on other people to respond, fix factual errors and misrepresentations, and write up a more globally coherent perspective. Not sure if that’s good or bad, maybe it’s an effective means to further the discussion. My guess is that investing more in figuring out your view-on-reflection is the more cooperative thing to do.

stuhlmueller Aug 5, 2022, 5:46 PM
4 points
0
on: Open & Welcome Thread—August 2022
Is there a keyboard shortcut for “go to next unread comment” (i.e. next comment marked with green line)? In large threads I currently scroll a while until I find the next green comment, but there must be a better way.

stuhlmueller Aug 4, 2022, 5:26 AM
LW: 12 AF: 8
3
AF
on: Externalized reasoning oversight: a research direction for language model alignment
I strongly agree that this is a promising direction. It’s similar to the bet on supervising process we’re making at Ought.
In the terminology of this post, our focus is on creating externalized reasoners that are
- authentic (reasoning is legible, complete, and causally responsible for the conclusions) and
- competitive (results are as good or better than results by end-to-end systems).
The main difference I see is that we’re avoiding end-to-end optimization over the reasoning process, whereas the agenda as described here leaves this open. More specifically, we’re aiming for authenticity through factored cognition—breaking down reasoning into individual steps that don’t share the larger context—because:
- it’s a way to enforce completeness and causal responsibility,
- it scales to more complex tasks than append-only chain-of-thought style reasoning
Developing tools to automate the oversight of externalized reasoning.
Do you have more thoughts on what would be good to build here?
We’ve recently started making developer tools for our own use as we debug and oversee compositional reasoning. For example, we’re recording function calls that correspond to substeps of reasoning so that we can zoom in on steps and see what the inputs and outputs looked like, and where things went wrong. Applied to a decomposition for the task “Did this paper use a placebo? If so, what was it?”:

stuhlmueller Jul 27, 2022, 4:37 PM
LW: 17 AF: 8
4
AF
on: AGI ruin scenarios are likely (and disjunctive)
And, lest you wonder what sort of single correlated already-known-to-me variable could make my whole argument and confidence come crashing down around me, it’s whether humanity’s going to rapidly become much more competent about AGI than it appears to be about everything else.
I conclude from this that we should push on making humanity more competent at everything that affects AGI outcomes, including policy, development, deployment, and coordination. In other times I’d think that’s pretty much impossible, but on my model of how AI goes our ability to increase our competence at reasoning, evidence, argumentation, and planning is sufficiently correlated with getting closer to AGI that it’s only very hard.
I imagine you think that this is basically impossible, i.e. not worth intervening on. Does that seem right?
If so, I’d guess your reasons are something like this:
1. Any system that can make a big difference in these domains is extremely dangerous because it would need to be better than us at planning, and danger is a function of competent plans. Can’t find a reference but it was discussed in one of the 2021 MIRI conversations.
2. The coordination problem is too hard. Even if some actors have better epistemics it won’t be enough. Eliezer states this position in AGI ruin:
weaksauce Overton-abiding stuff about ‘improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything’ will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically.
Does that sound right? Are there other important reasons?

stuhlmueller Jun 1, 2022, 4:12 PM
LW: 1 AF: 1
AF
in reply to: stuhlmueller’s comment on: Prize for Alignment Research Tasks
Thanks everyone for the submissions! William and I are reviewing them over the next week. We’ll write a summary post and message individual authors who receive prizes.

stuhlmueller May 31, 2022, 12:21 AM
LW: 7 AF: 5
AF
on: Prize for Alignment Research Tasks
The deadline for submissions to the Alignment Research Tasks competition is tomorrow, May 31!

Prize for Alignment Research Tasks

stuhlmueller and William_S

Apr 29, 2022, 8:57 AM

64 points

38 comments10 min readLW link

stuhlmueller Apr 20, 2022, 10:20 AM
LW: 8 AF: 3
AF
in reply to: Alex K. Chen (parrot)’s comment on: Elicit: Language Models as Research Assistants
Thanks for the long list of research questions!
On the caffeine/longevity question ⇒ would ought be able to factorize variables used in causal modeling? (eg figure out that caffeine is a mTOR+phosphodiesterase inhibitor and then factorize caffeine’s effects on longevity through mTOR/phosphodiesterase)? This could be used to make estimates for drugs even if there are no direct studies on the relationship between {drug, longevity}
Yes—causal reasoning is a clear case where decomposition seems promising. For example:
How does X affect Y?
1. What’s a Z on the causal path between X and Y, screening off Y from X?
2. What is X’s effect on Z?
3. What is Z’s effect on Y?
4. Based on the answers to 2 & 3, what is X’s effect on Y?
We’d need to be careful about all the usual ways causal reasoning can go wrong by ignoring confounders etc

stuhlmueller Apr 20, 2022, 10:11 AM
LW: 4 AF: 2
AF
in reply to: domenicrosati’s comment on: Elicit: Language Models as Research Assistants
Yeah, getting good at faithfulness is still an open problem. So far, we’ve mostly relied on imitative finetuning. to get misrepresentations down to about 10% (which is obviously still unacceptable). Going forward, I think that some combination of the following techniques will be needed to get performance to a reasonable level:
- Finetuning + RL from human preferences
- Adversarial data generation for finetuning + RL
- Verifier models, relying on evaluation being easier than generation
- Decomposition of verification, generating and testing ways that a claim could be wrong
- Debate (“self-criticism”)
- User feedback, highlighting situations where the model is wrong
- Tracking supporting information for each statement and through each chain of reasoning
- Voting among models trained/finetuned on different datasets
Thanks for the pointer to Pagnoni et al.

Elicit: Language Models as Research Assistants

stuhlmueller and jungofthewon

Apr 9, 2022, 2:56 PM

71 points

6 comments13 min readLW link

Supervise Process, not Outcomes

stuhlmueller and jungofthewon

Apr 5, 2022, 10:18 PM

145 points

9 comments10 min readLW link