Dave Orr

Karma: 1,623

DeepMind Gemini Safety lead; Foundation board member

Dave Orr Apr 24, 2025, 10:01 PM
3 points
0
on: AI #113: The o3 Era Begins
Formatting is still kind of bad, and is affecting readability. It’s been a couple of posts in a row now with long wall of text paragraphs. I feel like you changed something? And you should change it back. :)

Dave Orr Apr 19, 2025, 7:07 AM
5 points
2
on: LLM-based Fact Checking for Popular Posts?
Are there examples of posts with factual errors you think would be caught by LLMs?
One thing you could do is fact check a few likely posts and see if it’s adding substantial value. That would be more persuasive than abstract arguments.

Dave Orr Apr 11, 2025, 2:56 PM
22 points
12
on: On Google’s Safety Plan
“There have been some relatively discontinuous jumps already (e.g. GPT-3, 3.5 and 4), at least from the outside perspective.”
These are firmly within our definition of continuity—we intend our approach to handle jumps larger than seen in your examples here.
Possibly a disconnect is that from an end user perspective a new release can look like a big jump, while from a developer perspective it was continuous.
Note also that continuous can still be very fast. And of course we could be wrong about discontinuous jumps.

Dave Orr Mar 26, 2025, 1:43 PM
30 points
2
in reply to: Garrett Baker’s comment on: Recent AI model progress feels mostly like bullshit
I don’t work directly on pretraining, but when there were allegations of eval set contamination due to detection of a canary string last year, I looked into it specifically. I read the docs on prevention, talked with the lead engineer, and discussed with other execs.
So I have pretty detailed knowledge here. Of course GDM is a big complicated place and I certainly don’t know everything, but I’m confident that we are trying hard to prevent contamination.

Dave Orr Mar 24, 2025, 11:08 PM
48 points
4
on: Recent AI model progress feels mostly like bullshit
I work at GDM so obviously take that into account here, but in my internal conversations about external benchmarks we take cheating very seriously—we don’t want eval data to leak into training data, and have multiple lines of defense to keep that from happening. It’s not as trivial as you might think to avoid, since papers and blog posts and analyses can sometimes have specific examples from benchmarks in them, unmarked—and while we do look for this kind of thing, there’s no guarantee that we will be perfect at finding them. So it’s completely possible that some benchmarks are contaminated now. But I can say with assurance that for GDM it’s not intentional and we work to avoid it.
We do hill climb on notable benchmarks and I think there’s likely a certain amount of overfitting going on, especially with LMSys these days, and not just from us.
I think the main thing that’s happening is that benchmarks used to be a reasonable predictor of usefulness, and mostly are not now, presumably because of Goodhart reasons. The agent benchmarks are pretty different in kind and I expect are still useful as a measure of utility, and probably will be until they start to get more saturated, at which point we’ll all need to switch to something else.

Dave Orr Feb 23, 2025, 9:36 PM
3 points
0
on: Does human (mis)alignment pose a significant and imminent existential threat?
Humans have always been misaligned. Things now are probably significantly better in terms of human alignment than almost any time in history (citation needed) due to high levels of education and broad agreement about many things that we take for granted (e.g. the limits of free trade are debated but there has never been so much free trade). So you would need to think that something important was different now for there to be some kind of new existential risk.
One candidate is that as tech advances, the amount of damage a small misaligned group could do is growing. The obvious example is bioweapons—the number of people who could create a lethal engineered global pandemic is steadily going up, and at some point some of them may be evil enough to actually try to do it.
This is one of the arguments in favor of the AGI project. Whether you think it’s a good idea probably depends on your credences around human-caused xrisks versus AGI xrisk.

Dave Orr Feb 22, 2025, 1:03 AM
LW: 4 AF: 3
2
AF
on: Using Prompt Evaluation to Combat Bio-Weapon Research
One tip for research of this kind is to not only measure recall, but also precision. It’s easy to block 100% of dangerous prompts by blocking 100% of prompts, but obviously that doesn’t work in practice. The actual task that labs are trying to solve is to block as many unsafe prompts as possible while rarely blocking safe prompts, or in other words, looking at both precision and recall.
Of course with truly dangerous models and prompts, you do want ~100% recall, and in that situation it’s fair to say that nobody should ever be able to build a bioweapon. But in the world we currently live in, the amount of uplift you get from a frontier model and a prompt in your dataset isn’t very much, so it’s reasonable to trade off against losses from over refusal.

Dave Orr Feb 20, 2025, 5:46 PM
6 points
0
on: Eliezer’s Lost Alignment Articles / The Arbital Sequence
The pivotal act link is broken, fyi.

Dave Orr Jan 28, 2025, 1:53 AM
3 points
1
on: How different LLMs answered PhilPapers 2020 survey
Gemini V2 (1206 experimental which is the larger model) one boxes, so.… progress?

Dave Orr Jan 28, 2025, 1:28 AM
4 points
2
on: Is it ethical to work in AI “content evaluation”?
I’m probably too conflicted to give you advice here (I work on safety at Google DeepMind), but you might want to think through, at a gears level, what could concretely happen with your work that would lead to bad outcomes. Then you can balance that against positives (getting paid, becoming more familiar with model outputs, whatever).
You might also think about how your work compares to whoever would replace you on average, and what implications that might have as well.

Dave Orr Jan 22, 2025, 3:57 AM
7 points
0
on: Kitchen Air Purifier Comparison
This is great data! I’d been wondering about this myself.
Where were you measuring air quality? How far from the stove? Same place every time?

Dave Orr Jan 22, 2025, 1:56 AM
2 points
0
on: King Lear—A New Interpretation
Practicing LLM prompting?

Dave Orr Jan 18, 2025, 7:24 PM
5 points
0
in reply to: jbash’s comment on: What’s Wrong With the Simulation Argument?
I haven’t heard the p zombie argument before, but I agree that is at least some Bayesian evidence that we’re not in a sim.
1. We don’t know if simulated people will be p zombies
2. I am not a p zombie [citation needed]
3. It would be very surprising if sims were not p zombies but everyone in the physical universe is
4. Therefore the likelihood ratio of being conscious is higher for the real universe than a simulation
Probably 3 needs to be developed further, but this is the first new piece of evidence I’ve seen since I first encountered the simulation argument in like 2005.

Dave Orr Jan 11, 2025, 8:57 PM
2 points
0
in reply to: mikbp’s comment on: Is Musk still net-positive for humanity?
Are we playing the question game because the thread was started by Rosencranz? Is China doing well in the EV space a bad thing?

Dave Orr Jan 10, 2025, 10:37 PM
4 points
0
in reply to: Rosencrantz ’s comment on: Is Musk still net-positive for humanity?
Is it the case that the tech would exist without him? I think that’s pretty unclear, especially for SpaceX, where despite other startups in the space, nobody else managed to radically reduce the cost per launch in a way that transformed the industry.
Even for Tesla, which seems more pedestrian (heh) now, there were a number of years where they had the only viable car in the market. It was only once they proved it was feasible that everyone else piled in.

Dave Orr Jan 9, 2025, 4:17 AM
5 points
0
in reply to: Knight Lee’s comment on: ARC-AGI is a genuine AGI test but o3 cheated :(
Progress in ML looks a lot like, we had a different setup with different data and a tweaked algorithm and did better on this task. If you want to put an asterisk on o3 because it trained in some specific way that’s different from previous contenders, then basically every ML advance is going to have a similar asterisk. Seems like a lot of asterisking.

Dave Orr Jan 5, 2025, 5:10 PM
8 points
−1
on: Oppression and production are competing explanations for wealth inequality.
Hm I think the main thrust of this post misses something, which is that different conditions, even contradictory conditions, can easily happen locally. Obviously, it can be raining in San Francisco and sunny in LA, and you can have one person wearing a raincoat in SF and the other one the beach in LA with no problem, even if they are part of the same team.

I think this is true of wealth inequality.

Carnegie or Larry Page or Warren Buffett got their money in a non exploitative way, by being better than others at something that was extremely socially valuable. Part of what enables that is living in a society where capital is allocated by markets and there are clear price signals.

But many places in the world are not like this. Assad and Putin amassed their wealth via exploitative and extractive means. Wealth at the top in their societies is a tool of oppression.

I think this geographic heterogeneity implies that you should have one kind of program in the US (e.g. with about market failures with goods with potentially very high negative externalities like advanced AI) and another in e.g. Uganda where direct cash transfers (if you are careful to ensure they don’t get expropriated by whenever the local oppressors are) could be very high impact.

Dave Orr Dec 22, 2024, 7:22 PM
20 points
9
on: ARC-AGI is a genuine AGI test but o3 cheated :(
It seems very strange to me to say that they cheated, when the public training set is intended to be used exactly for training. They did what the test specified! And they didn’t even use all of it.
The whole point of the test is that some training examples aren’t going to unlock the rest of it. What training definitely does it teach the model how to output the JSON in the right format, and likely how to think about what to even do with these visual puzzles.
Do we say that humans aren’t a general intelligence even though for ~all valuable tasks, you have to take some time to practice, or someone has to show you, before you can do it well?

Dave Orr Dec 21, 2024, 7:49 PM
3 points
0
on: AGI with RL is Bad News for Safety
Why does RL necessarily mean that AIs are trained to plan ahead?

Dave Orr Dec 11, 2024, 6:15 AM
11 points
4
on: o1 Turns Pro
“Reliable fact recall is valuable, but why would o1 pro be especially good at it? It seems like that would be the opposite of reasoning, or of thinking for a long time?”

Current models were already good at identifying and fixing factual errors when run over a response and asked to critique and fix it. Works maybe 80% of the time to identify whether there’s a mistake, and can fix it at a somewhat lower rate.

So not surprising at all that a reasoning loop can do the same thing. Possibly there’s some other secret sauce in there, but just critiquing and fixing mistakes is probably enough to see the reported gains in o1.