Jesse Hoogland

Karma: 2,913

Executive director at Timaeus. Working on singular learning theory and developmental interpretability.

Website: jessehoogland.com

Twitter: @jesse_hoogland

The Sweet Lesson: AI Safety Should Scale With Compute

Jesse HooglandMay 5, 2025, 7:03 PM

87 points

1 comment3 min readLW link

Jesse Hoogland Feb 27, 2025, 5:50 AM
4 points
0
in reply to: Vinayak Pathak’s comment on: Empirical risk minimization is fundamentally confused
Looking back at this, I think this post is outdated and was trying a little too hard to be provocative. I agree with everything you say here. Especially: “One could reasonably say that PAC learning is somewhat confused, but learning theorists are working on it!”

Forgive my youthful naïvité. For what it’s worth, I think the generalization post in this sequence has stood the test of time much better.

Jesse Hoogland Feb 25, 2025, 9:00 PM
LW: 52 AF: 15
10
AF
on: Jesse Hoogland’s Shortform
Claude 3.7 reward hacks. During training, Claude 3.7 Sonnet sometimes resorted to “special-casing” to pass tests when it got stuck — including directly hardcoding expected outputs or even modifying test files themselves. Rumors are circulating that o1/o3 was doing similar things — like overwriting equality operators to get Python tests to pass — and this may have contributed to the delayed release.
This seems relevant to claims that “we’ll soon have reward models sophisticated enough to understand human values” and that inner alignment is the real challenge. Instead, we’re seeing real examples of reward-hacking at the frontier.
RL is becoming important again. We should expect old failure modes to rear their ugly heads.

Timaeus in 2024

Jesse Hoogland, Stan van Wingerden, Alexander Gietelink Oldenziel and Daniel Murfet

Feb 20, 2025, 11:54 PM

99 points

1 comment8 min readLW link

Jesse Hoogland Feb 20, 2025, 4:24 AM
LW: 3 AF: 1
0
AF
in reply to: Vinayak Pathak’s comment on: Neural networks generalize because of this one weird trick
But “models have singularities and thus number of parameters is not a good complexity measure” is not a valid criticism of VC theory.
Right, this quote is really a criticism of the classical Bayesian Information Criterion (for which the “Widely applicable Bayesian Information Criterion” WBIC is the relevant SLT generalization).
Ah, I didn’t realize earlier that this was the goal. Are there any theorems that use SLT to quantify out-of-distribution generalization? The SLT papers I have read so far seem to still be talking about in-distribution generalization, with the added comment that Bayesian learning/SGD is more likely to give us “simpler” models and simpler models generalize better.
That’s right: existing work is about in-distribution generalization. It is the case that, within the Bayesian setting, SLT provides an essentially complete account of in-distribution generalization. As you’ve pointed out there are remaining differences between Bayes and SGD. We’re working on applications to OOD but have not put anything out publicly about this yet.

Jesse Hoogland Feb 19, 2025, 11:45 PM
LW: 4 AF: 2
0
AF
in reply to: Vinayak Pathak’s comment on: You’re Measuring Model Complexity Wrong
To be precise, it is a property of singular models (which includes neural networks) in the Bayesian setting. There are good empirical reasons to expect the same to be true for neural networks trained with SGD (across a wide range of different models, we observe the LLC progressively increase from ~0 over the course of training).

Jesse Hoogland Feb 19, 2025, 11:43 PM
LW: 3 AF: 1
0
AF
in reply to: Vinayak Pathak’s comment on: Neural networks generalize because of this one weird trick
The key distinction is that VC theory takes a global, worst-case approach — it tries to bound generalization uniformly across an entire model class. This made sense historically but breaks down for modern neural networks, which are so expressive that the worst-case is always very bad and doesn’t get you anywhere.
The statistical learning theory community woke up to this fact (somewhat) with the Zhang et al. paper, which showed that deep neural networks can achieve perfect training loss on randomly labeled data (even with regularization). The same networks, when trained on natural data, will generalize well. VC dimension can’t explain this. If you can fit random noise, you get a huge (or even infinite) VC dimension and the resulting bounds fail to explain empircally observed generalization performance.
So I’d argue that dependence on the true-data distribution isn’t a weakness, but one of SLT’s great strengths. For highly expressive model classes, generalization only makes sense in reference to a data distribution. Global, uniform approaches like VC theory do not explain why neural networks generalize.
Thus if multiple parameter values lead to the same behaviour, this isn’t a problem for the theory at all because these redundancies do not increase the VC-dimension of the model class.
Multiple parameter values leading to the same behavior isn’t a problem — this is “the one weird trick.” The reason you don’t get the terribly generalizing solution that is overfit to noise is because simple solutions occupy more volume in the loss landscape, and are therefore easier to find. At the same time, simpler solutions generalize better (this is intuitively what Occam’s razor is getting at, though you can make it precise in the Bayesian setting). So it’s the solutions that generalize best that end up getting found.
If the claim is that it only needs to know certain properties of the true distribution that can be estimated from a small number of samples, then it will be nice to have a proof of such a claim (not sure if that exists).
I would say that this is a motivating conjecture and deep open problem (see, e.g., the natural abstractions agenda). I believe that something like this has to be true for learning to be at all possible. Real-world data distributions have structure; they do not resemble noise. This difference is what enables models to learn to generalize from finite samples.
Also note that if $P$ is allowed access to samples, then predicting whether your model generalizes is as simple as checking its performance on the test set.
For in-distribution generalization, yes, this is more or less true. But what we’d really like to get at is an understanding of how perturbations to the true distribution lead to changes in model behavior. That is, out-of-distribution generalization. Classical VC theory is completely hopeless when it comes to this. This only makes sense if you’re taking a more local approach.
See also my post on generalization here.

Jesse Hoogland Feb 12, 2025, 5:38 PM
2 points
0
in reply to: Kaarel’s comment on: Jesse Hoogland’s Shortform
Okay, great, then we just have to wait a year for AlphaProofZero to get a perfect score on the IMO.

Jesse Hoogland Feb 11, 2025, 6:22 PM
LW: 4 AF: 2
2
AF
in reply to: Michaël Trazzi’s comment on: Jesse Hoogland’s Shortform
Yes, my original comment wasn’t clear about this, but your nitpick is actually a key part of what I’m trying to get at.

Usually, you start with imitation learning and tack on RL at the end. That’s what AlphaGo is. It’s what predecessors to Dreamer-V3 like VPT are. It’s what current reasoning models are.
But then, eventually, you figure out how to bypass the imitation learning/behavioral cloning part and do RL from the start. Human priors serve as a temporary bootstrapping mechanism until we develop approaches that can learn effectively from scratch.

Jesse Hoogland Feb 10, 2025, 4:56 PM
LW: 40 AF: 16
17
AF
in reply to: Jesse Hoogland’s comment on: Jesse Hoogland’s Shortform
I think this is important because the safety community still isn’t thinking very much about search & RL, even after all the recent progress with reasoning models. We’ve updated very far away from AlphaZero as a reference class, and I think we will regret this.
On the other hand, the ideas I’m talking about here seem to have widespread recognition among people working on capabilities. Demis is very transparent about where they’re headed with language models, AlphaZero, and open-ended exploration (e.g., at 20:48). Noam Brown is adamant about test-time scaling/reasoning being the future (e.g., at 20:32). I think R1 has driven the message home for everyone else.

Jesse Hoogland Feb 10, 2025, 4:32 PM
3 points
−1
in reply to: CapResearcher’s comment on: Jesse Hoogland’s Shortform
With AlphaProof, the relevant piece is that the solver network generates its own proofs and disproofs to train against. There’s no imitation learning after formalization. There is a slight disanalogy where, for formalization, we mostly jumped straight to self-play/search, and I don’t think there was ever a major imitation-learning-based approach (though I did find at least one example).
Your quote “when reinforcement learning works well, imitation learning is no longer needed” is pretty close to what I mean. What I’m actually trying to get at is a stronger statement: we often bootstrap using imitation learning to figure out how to get the reinforcement learning component working initially, but once we do, we can usually discard the imitation learning entirely.

Jesse Hoogland Feb 10, 2025, 4:03 PM
6 points
0
in reply to: Jeremy Gillen’s comment on: Jesse Hoogland’s Shortform
That’s fun but a little long. Why not… BetaZero?

Jesse Hoogland Feb 10, 2025, 6:14 AM
LW: 75 AF: 25
14
AF
on: Jesse Hoogland’s Shortform
What do you call this phenomenon?
- First, you train AlphaGo on expert human examples. This is enough to beat Lee Sedol and Ke Jie. Then, you train AlphaZero purely through self-play. It destroys AlphaGo after only a few hours.
- First, you train RL agents on human playthroughs of Minecraft. They do okay. Then, DreamerV3 learns entirely by itself and becomes the first to get diamonds.
- First, you train theorem provers on human proofs. Then, you train AlphaProof using AlphaZero and you get silver on IMO for the first time.
- First, you pretrain a language model on all human data. Then...
This feels like a special case of the bitter lesson, but it’s not the same thing. It seems to rely on the distinction between prediction and search latent in ideas like AISI. It’s the kind of thing that I’m sure Gwern has christened in some comment lost to the internet’s backwaters. We should have a name for it—something more refined than just “foom.”

Jesse Hoogland Feb 7, 2025, 12:19 AM
5 points
0
in reply to: Brendan Long’s comment on: Timaeus is hiring researchers & engineers
We won’t strictly require it, but we will probably strongly encourage it. It’s not disqualifying, but it could make the difference between two similar candidates.

The Simplest Good

Jesse HooglandFeb 2, 2025, 7:51 PM

75 points

6 comments5 min readLW link

Kessler’s Second Syndrome

Jesse HooglandJan 26, 2025, 7:04 AM

69 points

2 comments3 min readLW link

Brainrot

Jesse HooglandJan 26, 2025, 5:35 AM

43 points

0 comments3 min readLW link

The Rising Sea

Jesse HooglandJan 25, 2025, 8:48 PM

92 points

2 comments2 min readLW link

Jesse Hoogland 22 Jan 2025 19:17 UTC
15 points
2
in reply to: Nathan Helm-Burger’s comment on: Jesse Hoogland’s Shortform
Post-training consists of two RL stages followed by two SFT stages, one of which includes creative writing generated by DeepSeek-V3. This might account for the model both being good at creative writing and seeming closer to a raw base model.

Another possibility is the fact that they apply the RL stages immediately after pretraining, without any intermediate SFT stage.

Jesse Hoogland 21 Jan 2025 23:53 UTC
143 points
37
on: Jesse Hoogland’s Shortform
Implications of DeepSeek-R1: Yesterday, DeepSeek released a paper on their o1 alternative, R1. A few implications stood out to me:
- Reasoning is easy. A few weeks ago, I described several hypotheses for how o1 works. R1 suggests the answer might be the simplest possible approach: guess & check. No need for fancy process reward models, no need for MCTS.
- Small models, big think. A distilled 7B-parameter version of R1 beats GPT-4o and Claude-3.5 Sonnet new on several hard math benchmarks. There appears to be a large parameter overhang.
- Proliferation by default. There’s an implicit assumption in many AI safety/governance proposals that AGI development will be naturally constrained to only a few actors because of compute requirements. Instead, we seem to be headed to a world where:
  - Advanced capabilities can be squeezed into small, efficient models that can run on commodity hardware.
  - Proliferation is not bottlenecked by infrastructure.
  - Regulatory control through hardware restriction becomes much less viable.
For now, training still needs industrial compute. But it’s looking increasingly like we won’t be able to contain what comes after.
What links here?
- o1: A Technical Primer by Jesse Hoogland (9 Dec 2024 19:09 UTC; 170 points)

Jesse Hoogland

The Sweet Les­son: AI Safety Should Scale With Compute

Ti­maeus in 2024

The Sim­plest Good

Kessler’s Se­cond Syndrome