RSS

Con­jec­ture (org)

TagLast edit: Dec 30, 2024, 9:24 AM by Dakara

Conjecture is an alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research.

The initial directions of their research agenda include:

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor LeahyApr 8, 2022, 11:40 AM
197 points
25 comments4 min readLW link

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël TrazziJul 22, 2022, 6:44 PM
195 points
29 comments14 min readLW link
(theinsideview.ai)

Episte­molog­i­cal Vigilance for Alignment

adamShimiJun 6, 2022, 12:27 AM
66 points
11 comments10 min readLW link

Ques­tions about Con­je­cure’s CoEm proposal

Mar 9, 2023, 7:32 PM
51 points
4 comments2 min readLW link

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris SalaMay 22, 2023, 2:31 PM
155 points
5 comments3 min readLW link
(www.conjecture.dev)

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

Feb 25, 2023, 7:35 PM
195 points
46 comments4 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

Nov 23, 2022, 5:10 PM
180 points
9 comments8 min readLW link

Re-Ex­am­in­ing LayerNorm

Eric WinsorDec 1, 2022, 10:20 PM
127 points
12 comments5 min readLW link

Refine’s First Blog Post Day

adamShimiAug 13, 2022, 10:23 AM
55 points
3 comments1 min readLW link

Hu­man de­ci­sion pro­cesses are not well factored

Feb 17, 2023, 1:11 PM
33 points
3 comments2 min readLW link

Search­ing for Search

Nov 28, 2022, 3:31 PM
97 points
9 comments14 min readLW link1 review

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

Feb 24, 2023, 11:03 PM
61 points
7 comments47 min readLW link

AGI in sight: our look at the game board

Feb 18, 2023, 10:17 PM
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Em­pa­thy as a nat­u­ral con­se­quence of learnt re­ward models

berenFeb 4, 2023, 3:35 PM
48 points
27 comments13 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimiApr 9, 2022, 9:43 AM
47 points
42 comments1 min readLW link

Refine’s Se­cond Blog Post Day

adamShimiAug 20, 2022, 1:01 PM
19 points
0 comments1 min readLW link

Ja­pan AI Align­ment Conference

Mar 10, 2023, 6:56 AM
64 points
7 comments1 min readLW link
(www.conjecture.dev)

Cri­tiques of promi­nent AI safety labs: Conjecture

Omega.Jun 12, 2023, 1:32 AM
12 points
32 comments33 min readLW link

What I Learned Run­ning Refine

adamShimiNov 24, 2022, 2:49 PM
108 points
5 comments4 min readLW link

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

Nov 28, 2022, 12:54 PM
199 points
33 comments31 min readLW link

The First Filter

Nov 26, 2022, 7:37 PM
67 points
5 comments1 min readLW link

Bi­ases are en­g­ines of cognition

Nov 30, 2022, 4:47 PM
46 points
7 comments1 min readLW link

Trade­offs in com­plex­ity, ab­strac­tion, and generality

Dec 12, 2022, 3:55 PM
32 points
0 comments2 min readLW link

Psy­cholog­i­cal Di­sor­ders and Problems

Dec 12, 2022, 6:15 PM
39 points
6 comments1 min readLW link

Men­tal ac­cep­tance and reflection

Dec 22, 2022, 2:32 PM
34 points
1 comment2 min readLW link

Ba­sic Facts about Lan­guage Model Internals

Jan 4, 2023, 1:01 PM
130 points
19 comments9 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

Feb 15, 2023, 6:11 PM
100 points
27 comments4 min readLW link

FLI Pod­cast: Con­nor Leahy on AI Progress, Chimps, Memes, and Mar­kets (Part 1/​3)

Feb 10, 2023, 1:55 PM
39 points
0 comments43 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimiAug 20, 2022, 12:56 PM
24 points
2 comments2 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimiAug 13, 2022, 10:01 AM
33 points
2 comments2 min readLW link

Ab­stract­ing The Hard­ness of Align­ment: Un­bounded Atomic Optimization

adamShimiJul 29, 2022, 6:59 PM
72 points
3 comments16 min readLW link

Levels of Pluralism

adamShimiJul 27, 2022, 9:35 AM
37 points
0 comments14 min readLW link

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimiJul 23, 2022, 11:40 AM
38 points
5 comments3 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimiJul 20, 2022, 10:44 AM
87 points
11 comments8 min readLW link

Mo­saic and Pal­impsests: Two Shapes of Research

adamShimiJul 12, 2022, 9:05 AM
39 points
3 comments9 min readLW link

Refine: An In­cu­ba­tor for Con­cep­tual Align­ment Re­search Bets

adamShimiApr 15, 2022, 8:57 AM
144 points
13 comments4 min readLW link

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee SharkeyJul 14, 2022, 4:59 PM
114 points
15 comments33 min readLW link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

Jul 29, 2022, 7:07 PM
131 points
6 comments19 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Orpheus16Sep 15, 2022, 6:37 PM
107 points
23 comments15 min readLW link

Method­olog­i­cal Ther­apy: An Agenda For Tack­ling Re­search Bottlenecks

Sep 22, 2022, 6:41 PM
54 points
6 comments9 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Sep 23, 2022, 5:58 PM
144 points
29 comments33 min readLW link

Mys­ter­ies of mode collapse

janusNov 8, 2022, 10:37 AM
284 points
57 comments14 min readLW link1 review

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

Nov 16, 2022, 2:14 PM
89 points
2 comments12 min readLW link

AGI will have learnt util­ity functions

berenJan 25, 2023, 7:42 PM
36 points
4 comments13 min readLW link

Con­jec­ture Se­cond Hiring Round

Nov 23, 2022, 5:11 PM
92 points
0 comments1 min readLW link

Gra­di­ent hack­ing is ex­tremely difficult

berenJan 24, 2023, 3:45 PM
164 points
22 comments5 min readLW link

Why al­most ev­ery RL agent does learned optimization

Lee SharkeyFeb 12, 2023, 4:58 AM
32 points
3 comments5 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

Dec 13, 2022, 3:41 PM
150 points
23 comments22 min readLW link2 reviews

Ba­sic facts about lan­guage mod­els dur­ing training

berenFeb 21, 2023, 11:46 AM
98 points
15 comments18 min readLW link

Ja­pan AI Align­ment Con­fer­ence Postmortem

Apr 20, 2023, 10:58 AM
71 points
8 comments8 min readLW link

Con­jec­ture: A Roadmap for Cog­ni­tive Soft­ware and A Hu­man­ist Fu­ture of AI

Dec 2, 2024, 1:28 PM
44 points
10 comments29 min readLW link
(www.conjecture.dev)

A tech­ni­cal note on bil­in­ear lay­ers for interpretability

Lee SharkeyMay 8, 2023, 6:06 AM
59 points
0 comments1 min readLW link
(arxiv.org)

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

May 1, 2023, 4:47 PM
96 points
10 comments30 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian FreedApr 24, 2023, 5:23 PM
7 points
0 comments4 min readLW link

Launch­ing Ap­pli­ca­tions for the Global AI Safety Fel­low­ship 2025!

Aditya_SKNov 30, 2024, 2:02 PM
11 points
5 comments1 min readLW link

A cou­ple of ques­tions about Con­jec­ture’s Cog­ni­tive Emu­la­tion proposal

Igor IvanovApr 11, 2023, 2:05 PM
30 points
1 comment3 min readLW link

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Connor LeahyAug 29, 2023, 10:56 AM
63 points
13 comments1 min readLW link
(www.youtube.com)

Con­jec­ture: A stand­ing offer for pub­lic de­bates on AI

Andrea_MiottiJun 16, 2023, 2:33 PM
29 points
1 comment2 min readLW link
(www.conjecture.dev)

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre VariengienFeb 6, 2024, 7:10 PM
75 points
12 comments16 min readLW link
No comments.