RSS

Scal­ing Laws

TagLast edit: Jun 18, 2023, 11:35 PM by riley

Scaling Laws refer to the observed trend that the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as one varies the amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, or number of training steps) follows variants of power laws.

External links

Scaling laws graph from Scaling Laws for Neural Language Models

“Can AI Scal­ing Con­tinue Through 2030?”, Epoch AI (yes)

gwernAug 24, 2024, 1:40 AM
129 points
4 comments3 min readLW link
(epochai.org)

chin­chilla’s wild implications

nostalgebraistJul 31, 2022, 1:18 AM
424 points
128 comments10 min readLW link1 review

Ethan Ca­ballero on Bro­ken Neu­ral Scal­ing Laws, De­cep­tion, and Re­cur­sive Self Improvement

Nov 4, 2022, 6:09 PM
16 points
11 comments10 min readLW link
(theinsideview.ai)

/​r/​MLS­cal­ing: new sub­red­dit for NN scal­ing re­search/​discussion

gwernOct 30, 2020, 8:50 PM
21 points
0 comments1 min readLW link
(www.reddit.com)

What will GPT-2030 look like?

jsteinhardtJun 7, 2023, 11:40 PM
185 points
43 comments23 min readLW link
(bounded-regret.ghost.io)

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogaoJun 2, 2021, 9:32 PM
82 points
11 comments17 min readLW link

Google’s new text-to-image model—Parti, a demon­stra­tion of scal­ing benefits

KaydenJun 22, 2022, 8:00 PM
32 points
4 comments1 min readLW link

My ML Scal­ing bibliography

gwernOct 23, 2021, 2:41 PM
35 points
9 comments1 min readLW link
(www.gwern.net)

Dmitry’s Koan

Dmitry VaintrobJan 10, 2025, 4:27 AM
44 points
8 comments22 min readLW link

Mus­ings on Text Data Wall (Oct 2024)

Vladimir_NesovOct 5, 2024, 7:00 PM
40 points
2 comments5 min readLW link

[Question] Non­lin­ear limi­ta­tions of ReLUs

magfrumpOct 26, 2023, 6:51 PM
13 points
1 comment1 min readLW link

Dens­ing Law of LLMs

Bogdan Ionut CirsteaDec 8, 2024, 7:35 PM
9 points
2 comments1 min readLW link
(arxiv.org)

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

OzyrusOct 11, 2021, 3:28 PM
51 points
36 comments1 min readLW link
(developer.nvidia.com)

Ethan Ca­ballero on Pri­vate Scal­ing Progress

Michaël TrazziMay 5, 2022, 6:32 PM
63 points
2 comments2 min readLW link
(theinsideview.github.io)

In­verse scal­ing can be­come U-shaped

Edouard HarrisNov 8, 2022, 7:04 PM
27 points
15 comments1 min readLW link
(arxiv.org)

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraistMar 31, 2022, 6:01 PM
51 points
23 comments1 min readLW link
(arxiv.org)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

Sep 4, 2023, 12:54 PM
109 points
16 comments5 min readLW link
(arxiv.org)

Mus­ings on LLM Scale (Jul 2024)

Vladimir_NesovJul 3, 2024, 6:35 PM
34 points
0 comments3 min readLW link

On AI Scaling

harsimonyFeb 5, 2025, 8:24 PM
6 points
3 comments8 min readLW link
(splittinginfinity.substack.com)

[Linkpost] Scal­ing Laws for Gen­er­a­tive Mixed-Mo­dal Lan­guage Models

Amal Jan 12, 2023, 2:24 PM
15 points
2 comments1 min readLW link
(arxiv.org)

In­verse Scal­ing Prize: Se­cond Round Winners

Jan 24, 2023, 8:12 PM
58 points
17 comments15 min readLW link

[Question] Is there a “crit­i­cal thresh­old” for LLM scal­ing laws?

Logan ZoellnerMar 30, 2024, 12:23 PM
7 points
1 comment1 min readLW link

o1: A Tech­ni­cal Primer

Jesse HooglandDec 9, 2024, 7:09 PM
169 points
19 comments9 min readLW link
(www.youtube.com)

The effect of hori­zon length on scal­ing laws

Jacob_HiltonFeb 1, 2023, 3:59 AM
23 points
2 comments1 min readLW link
(arxiv.org)

A closer look at chess scal­ings (into the past)

hippkeJul 15, 2021, 8:13 AM
50 points
14 comments4 min readLW link

[Question] Clar­ify­ing how mis­al­ign­ment can arise from scal­ing LLMs

UtilAug 19, 2023, 2:16 PM
3 points
1 comment1 min readLW link

Pa­ram­e­ter counts in Ma­chine Learning

Jun 19, 2021, 4:04 PM
47 points
18 comments7 min readLW link

How much chess en­g­ine progress is about adapt­ing to big­ger com­put­ers?

paulfchristianoJul 7, 2021, 10:35 PM
114 points
23 comments6 min readLW link

Trans­for­ma­tive AI and Com­pute [Sum­mary]

lennartSep 26, 2021, 11:41 AM
14 points
0 comments9 min readLW link

What is Com­pute? - Trans­for­ma­tive AI and Com­pute [1/​4]

lennartSep 23, 2021, 4:25 PM
27 points
9 comments19 min readLW link

Pro­posal: Scal­ing laws for RL generalization

axiomanOct 1, 2021, 9:32 PM
14 points
12 comments11 min readLW link

Fore­cast­ing Com­pute—Trans­for­ma­tive AI and Com­pute [2/​4]

lennartOct 2, 2021, 3:54 PM
17 points
0 comments19 min readLW link

Com­pute Gover­nance and Con­clu­sions—Trans­for­ma­tive AI and Com­pute [3/​4]

lennartOct 14, 2021, 8:23 AM
13 points
0 comments5 min readLW link

Com­pute Re­search Ques­tions and Met­rics—Trans­for­ma­tive AI and Com­pute [4/​4]

lennartNov 28, 2021, 10:49 PM
7 points
0 comments16 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius HobbhahnNov 29, 2021, 3:18 PM
16 points
5 comments7 min readLW link

How I’m think­ing about GPT-N

delton137Jan 17, 2022, 5:11 PM
54 points
21 comments18 min readLW link

Es­ti­mat­ing train­ing com­pute of Deep Learn­ing models

Jan 20, 2022, 4:12 PM
37 points
4 comments1 min readLW link

Com­pute Trends Across Three eras of Ma­chine Learning

Feb 16, 2022, 2:18 PM
94 points
13 comments2 min readLW link

Com­pute Trends — Com­par­i­son to OpenAI’s AI and Compute

Mar 12, 2022, 6:09 PM
23 points
3 comments3 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanSJun 10, 2022, 8:53 AM
25 points
21 comments1 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

Jun 20, 2022, 10:54 AM
86 points
30 comments15 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

Jun 27, 2022, 3:58 PM
171 points
14 comments7 min readLW link

Trends in GPU price-performance

Jul 1, 2022, 3:51 PM
85 points
13 comments1 min readLW link1 review
(epochai.org)

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo VillalobosJul 18, 2022, 4:51 PM
20 points
0 comments1 min readLW link
(epochai.org)

A Quick Note on AI Scal­ing Asymptotes

alyssavanceMay 25, 2022, 2:55 AM
44 points
7 comments1 min readLW link

How should Deep­Mind’s Chin­chilla re­vise our AI fore­casts?

Cleo NardoSep 15, 2022, 5:54 PM
35 points
12 comments13 min readLW link

Smoke with­out fire is scary

Adam JermynOct 4, 2022, 9:08 PM
52 points
22 comments4 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

Oct 20, 2022, 12:20 AM
103 points
13 comments1 min readLW link
(arxiv.org)

Mas­sive Scal­ing Should be Frowned Upon

harsimonyNov 17, 2022, 8:43 AM
4 points
6 comments5 min readLW link

[Question] Up­dates on scal­ing laws for foun­da­tion mod­els from ′ Tran­scend­ing Scal­ing Laws with 0.1% Ex­tra Com­pute’

Nick_GreigNov 18, 2022, 12:46 PM
15 points
2 comments1 min readLW link

Some Ar­gu­ments Against Strong Scaling

Joar SkalseJan 13, 2023, 12:04 PM
25 points
21 comments16 min readLW link

Whisper’s Wild Implications

Ollie JJan 3, 2023, 12:17 PM
19 points
6 comments5 min readLW link

Pa­ram­e­ter Scal­ing Comes for RL, Maybe

1a3ornJan 24, 2023, 1:55 PM
100 points
3 comments14 min readLW link

Scal­ing Laws Liter­a­ture Review

Pablo VillalobosJan 27, 2023, 7:57 PM
36 points
1 comment4 min readLW link
(epochai.org)

The Per­cep­tron Controversy

Yuxi_LiuJan 10, 2024, 11:07 PM
65 points
18 comments1 min readLW link
(yuxi-liu-wired.github.io)

Pre­dict­ing AGI by the Tur­ing Test

Yuxi_LiuJan 22, 2024, 4:22 AM
21 points
2 comments10 min readLW link
(yuxi-liu-wired.github.io)

Trans­fer learn­ing and gen­er­al­iza­tion-qua-ca­pa­bil­ity in Bab­bage and Davinci (or, why di­vi­sion is bet­ter than Span­ish)

RP and agg
Feb 9, 2024, 7:00 AM
50 points
6 comments3 min readLW link

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM
57 points
13 comments3 min readLW link

In­tel­li­gence Is Jagged

Adam TrainFeb 19, 2025, 7:08 AM
6 points
1 comment3 min readLW link

Scal­ing Laws and Superposition

Pavan KattaApr 10, 2024, 3:36 PM
9 points
4 comments5 min readLW link
(www.pavankatta.com)

Neu­ral Scal­ing Laws Rooted in the Data Distribution

aribrillFeb 20, 2025, 9:22 PM
6 points
0 comments1 min readLW link
(arxiv.org)

How LLMs Learn: What We Know, What We Don’t (Yet) Know, and What Comes Next

JonasbJul 9, 2024, 9:58 AM
2 points
0 comments16 min readLW link
(www.denominations.io)

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

Jul 22, 2024, 4:17 PM
69 points
0 comments16 min readLW link

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_ANov 24, 2024, 5:17 PM
1 point
0 comments9 min readLW link

prÆy

oimrqsJan 11, 2025, 7:42 PM
1 point
0 comments1 min readLW link

Im­pli­ca­tions of the in­fer­ence scal­ing paradigm for AI safety

Ryan KiddJan 14, 2025, 2:14 AM
89 points
69 comments5 min readLW link

The Quan­ti­za­tion Model of Neu­ral Scaling

nzMar 31, 2023, 4:02 PM
17 points
0 comments1 min readLW link
(arxiv.org)

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

Nov 8, 2023, 11:37 AM
49 points
0 comments18 min readLW link

What’s new at FAR AI

Dec 4, 2023, 9:18 PM
41 points
0 comments5 min readLW link
(far.ai)

Data and “to­kens” a 30 year old hu­man “trains” on

Jose Miguel Cruz y CelisMay 23, 2023, 5:34 AM
15 points
15 comments1 min readLW link

Why Job Dis­place­ment Pre­dic­tions are Wrong: Ex­pla­na­tions of Cog­ni­tive Automation

Moritz WallawitschMay 30, 2023, 8:43 PM
−4 points
0 comments8 min readLW link

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirkJul 20, 2023, 9:56 AM
39 points
2 comments5 min readLW link

[Linkpost] Ap­pli­ca­bil­ity of scal­ing laws to vi­sion en­cod­ing models

Bogdan Ionut CirsteaAug 5, 2023, 11:10 AM
11 points
2 comments1 min readLW link