RSS

Transformers

TagLast edit: Feb 24, 2022, 11:01 AM by Vivek Hebbar

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM
37 points
4 comments2 min readLW link

How LLMs are and are not myopic

janusJul 25, 2023, 2:19 AM
134 points
16 comments8 min readLW link

Google’s PaLM-E: An Em­bod­ied Mul­ti­modal Lan­guage Model

SandXboxMar 7, 2023, 4:11 AM
87 points
7 comments1 min readLW link
(palm-e.github.io)

AGI will be made of het­ero­ge­neous com­po­nents, Trans­former and Selec­tive SSM blocks will be among them

Roman LeventovDec 27, 2023, 2:51 PM
33 points
9 comments4 min readLW link

Modern Trans­form­ers are AGI, and Hu­man-Level

abramdemskiMar 26, 2024, 5:46 PM
219 points
88 comments5 min readLW link

[Question] If I ask an LLM to think step by step, how big are the steps?

ryan_bSep 13, 2024, 8:30 PM
7 points
1 comment1 min readLW link

How Do In­duc­tion Heads Ac­tu­ally Work in Trans­form­ers With Finite Ca­pac­ity?

Fabien RogerMar 23, 2023, 9:09 AM
27 points
0 comments5 min readLW link

Resi­d­ual stream norms grow ex­po­nen­tially over the for­ward pass

May 7, 2023, 12:46 AM
77 points
24 comments11 min readLW link

How fast can we perform a for­ward pass?

jsteinhardtJun 10, 2022, 11:30 PM
53 points
9 comments15 min readLW link
(bounded-regret.ghost.io)

Con­crete Steps to Get Started in Trans­former Mechanis­tic Interpretability

Neel NandaDec 25, 2022, 10:21 PM
57 points
7 comments12 min readLW link
(www.neelnanda.io)

Tracr: Com­piled Trans­form­ers as a Lab­o­ra­tory for In­ter­pretabil­ity | Deep­Mind

DragonGodJan 13, 2023, 4:53 PM
62 points
12 comments1 min readLW link
(arxiv.org)

[Question] Bar­cod­ing LLM Train­ing Data Sub­sets. Any­one try­ing this for in­ter­pretabil­ity?

right..enough?Apr 13, 2024, 3:09 AM
7 points
0 comments7 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam ShaiApr 16, 2024, 9:16 PM
412 points
100 comments12 min readLW link

An in­ter­est­ing math­e­mat­i­cal model of how LLMs work

Bill BenzonApr 30, 2024, 11:01 AM
5 points
0 comments1 min readLW link

If lan­guage is for com­mu­ni­ca­tion, what does that im­ply about LLMs?

Bill BenzonMay 12, 2024, 2:55 AM
10 points
0 comments1 min readLW link

Ex­plor­ing Llama-3-8B MLP Neurons

ntt123Jun 9, 2024, 2:19 PM
10 points
0 comments4 min readLW link
(neuralblog.github.io)

Adam Op­ti­mizer Causes Priv­ileged Ba­sis in Trans­former LM Resi­d­ual Stream

Sep 6, 2024, 5:55 PM
70 points
7 comments4 min readLW link

Logit Prisms: De­com­pos­ing Trans­former Out­puts for Mechanis­tic Interpretability

ntt123Jun 17, 2024, 11:46 AM
5 points
4 comments6 min readLW link
(neuralblog.github.io)

Week One of Study­ing Trans­form­ers Architecture

JustisMillsJun 20, 2024, 3:47 AM
3 points
0 comments15 min readLW link
(justismills.substack.com)

How Big a Deal are MatMul-Free Trans­form­ers?

JustisMillsJun 27, 2024, 10:28 PM
19 points
6 comments5 min readLW link
(justismills.substack.com)

Ad­den­dum: More Effi­cient FFNs via Attention

Robert_AIZIFeb 6, 2023, 6:55 PM
10 points
2 comments5 min readLW link
(aizi.substack.com)

Char­ac­ter­iz­ing sta­ble re­gions in the resi­d­ual stream of LLMs

Sep 26, 2024, 1:44 PM
42 points
4 comments1 min readLW link
(arxiv.org)

Trans­form­ers Ex­plained (Again)

RohanSOct 22, 2024, 4:06 AM
4 points
0 comments18 min readLW link

An­a­lyz­ing how SAE fea­tures evolve across a for­ward pass

Nov 7, 2024, 10:07 PM
47 points
0 comments1 min readLW link
(arxiv.org)

[Question] Are Mix­ture-of-Ex­perts Trans­form­ers More In­ter­pretable Than Dense Trans­form­ers?

simeon_cDec 31, 2022, 11:34 AM
8 points
5 comments1 min readLW link

Monet: Mix­ture of Monose­man­tic Ex­perts for Trans­form­ers Explained

CalebMarescaJan 25, 2025, 7:37 PM
19 points
2 comments11 min readLW link

So, just why do GPTs have to op­er­ate by con­tin­u­ing an ex­ist­ing string?

Bill BenzonMar 24, 2023, 12:08 PM
−4 points
0 comments3 min readLW link

We Need To Know About Con­tinual Learning

michael_mjdApr 22, 2023, 5:08 PM
29 points
14 comments4 min readLW link

The Method of Loci: With some brief re­marks, in­clud­ing trans­form­ers and eval­u­at­ing AIs

Bill BenzonDec 2, 2023, 2:36 PM
6 points
0 comments3 min readLW link

Has any­one ex­per­i­mented with Do­drio, a tool for ex­plor­ing trans­former mod­els through in­ter­ac­tive vi­su­al­iza­tion?

Bill BenzonDec 11, 2023, 8:34 PM
4 points
0 comments1 min readLW link

An Anal­ogy for Un­der­stand­ing Transformers

CallumMcDougallMay 13, 2023, 12:20 PM
89 points
6 comments9 min readLW link

Trans­former Ar­chi­tec­ture Choice for Re­sist­ing Prompt In­jec­tion and Jail-Break­ing Attacks

RogerDearnaleyMay 21, 2023, 8:29 AM
9 points
1 comment4 min readLW link

Neu­roevolu­tion, So­cial In­tel­li­gence, and Logic

vinnik.dmitry07May 31, 2023, 5:54 PM
1 point
0 comments10 min readLW link

[Question] Killing Re­cur­rent Me­mory Over Self At­ten­tion?

Del NoboloJun 6, 2023, 11:02 PM
3 points
0 comments1 min readLW link

GPT-2′s po­si­tional em­bed­ding ma­trix is a helix

AdamYedidiaJul 21, 2023, 4:16 AM
44 points
21 comments4 min readLW link

The po­si­tional em­bed­ding ma­trix and pre­vi­ous-to­ken heads: how do they ac­tu­ally work?

AdamYedidiaAug 10, 2023, 1:58 AM
26 points
4 comments13 min readLW link

Google Deep­Mind’s RT-2

SandXboxAug 11, 2023, 11:26 AM
9 points
1 comment1 min readLW link
(robotics-transformer2.github.io)

World, mind, and learn­abil­ity: A note on the meta­phys­i­cal struc­ture of the cos­mos [& LLMs]

Bill BenzonSep 5, 2023, 12:19 PM
4 points
1 comment5 min readLW link

New Tool: the Resi­d­ual Stream Viewer

AdamYedidiaOct 1, 2023, 12:49 AM
32 points
7 comments4 min readLW link
(tinyurl.com)

Ex­plor­ing the Resi­d­ual Stream of Trans­form­ers for Mechanis­tic In­ter­pretabil­ity — Explained

Zeping YuDec 26, 2023, 12:36 AM
7 points
1 comment11 min readLW link

Re­search agenda—Build­ing a multi-modal chess-lan­guage model

p.b.Apr 7, 2022, 12:25 PM
8 points
2 comments2 min readLW link

No Really, At­ten­tion is ALL You Need—At­ten­tion can do feed­for­ward networks

Robert_AIZIJan 31, 2023, 6:48 PM
29 points
7 comments6 min readLW link
(aizi.substack.com)

Search­ing for Mo­du­lar­ity in Large Lan­guage Models

Sep 8, 2022, 2:25 AM
44 points
3 comments14 min readLW link

Brief Notes on Transformers

Adam JermynSep 26, 2022, 2:46 PM
48 points
3 comments2 min readLW link

Find­ing Back­ward Chain­ing Cir­cuits in Trans­form­ers Trained on Tree Search

May 28, 2024, 5:29 AM
50 points
1 comment9 min readLW link
(arxiv.org)

At­ten­tion SAEs Scale to GPT-2 Small

Feb 3, 2024, 6:50 AM
78 points
4 comments8 min readLW link

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM
57 points
13 comments3 min readLW link

Vi­su­al­iz­ing small At­ten­tion-only Transformers

WCargoNov 19, 2024, 9:37 AM
4 points
0 comments8 min readLW link

De­con­fus­ing In-Con­text Learning

Arjun PanicksseryFeb 25, 2024, 9:48 AM
37 points
1 comment2 min readLW link

Build­ing a trans­former from scratch—AI safety up-skil­ling challenge

Marius HobbhahnOct 12, 2022, 3:40 PM
42 points
1 comment5 min readLW link

De­com­piling Tracr Trans­form­ers—An in­ter­pretabil­ity experiment

Hannes ThurnherrMar 27, 2024, 9:49 AM
4 points
0 comments14 min readLW link

Un­der­stand­ing mesa-op­ti­miza­tion us­ing toy models

May 7, 2023, 5:00 PM
43 points
2 comments10 min readLW link
No comments.