RSS

Con­jec­ture (org)

TagLast edit: 30 Dec 2024 9:24 UTC by Dakara

Conjecture is an alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research.

The initial directions of their research agenda include:

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor Leahy8 Apr 2022 11:40 UTC
197 points
25 comments4 min readLW link

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël Trazzi22 Jul 2022 18:44 UTC
195 points
29 comments14 min readLW link
(theinsideview.ai)

Episte­molog­i­cal Vigilance for Alignment

adamShimi6 Jun 2022 0:27 UTC
66 points
11 comments10 min readLW link

Ques­tions about Con­je­cure’s CoEm proposal

9 Mar 2023 19:32 UTC
51 points
4 comments2 min readLW link

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris Sala22 May 2023 14:31 UTC
155 points
5 comments3 min readLW link
(www.conjecture.dev)

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

25 Feb 2023 19:35 UTC
195 points
46 comments4 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

23 Nov 2022 17:10 UTC
180 points
9 comments8 min readLW link

Re-Ex­am­in­ing LayerNorm

Eric Winsor1 Dec 2022 22:20 UTC
127 points
12 comments5 min readLW link

Refine’s First Blog Post Day

adamShimi13 Aug 2022 10:23 UTC
55 points
3 comments1 min readLW link

Hu­man de­ci­sion pro­cesses are not well factored

17 Feb 2023 13:11 UTC
33 points
3 comments2 min readLW link

Search­ing for Search

28 Nov 2022 15:31 UTC
94 points
9 comments14 min readLW link1 review

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
61 points
7 comments47 min readLW link

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Em­pa­thy as a nat­u­ral con­se­quence of learnt re­ward models

beren4 Feb 2023 15:35 UTC
46 points
26 comments13 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
47 points
42 comments1 min readLW link

Refine’s Se­cond Blog Post Day

adamShimi20 Aug 2022 13:01 UTC
19 points
0 comments1 min readLW link

Ja­pan AI Align­ment Conference

10 Mar 2023 6:56 UTC
64 points
7 comments1 min readLW link
(www.conjecture.dev)

Cri­tiques of promi­nent AI safety labs: Conjecture

Omega.12 Jun 2023 1:32 UTC
12 points
32 comments33 min readLW link

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
108 points
5 comments4 min readLW link

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

28 Nov 2022 12:54 UTC
199 points
33 comments31 min readLW link

The First Filter

26 Nov 2022 19:37 UTC
67 points
5 comments1 min readLW link

Bi­ases are en­g­ines of cognition

30 Nov 2022 16:47 UTC
46 points
7 comments1 min readLW link

Trade­offs in com­plex­ity, ab­strac­tion, and generality

12 Dec 2022 15:55 UTC
32 points
0 comments2 min readLW link

Psy­cholog­i­cal Di­sor­ders and Problems

12 Dec 2022 18:15 UTC
39 points
6 comments1 min readLW link

Men­tal ac­cep­tance and reflection

22 Dec 2022 14:32 UTC
34 points
1 comment2 min readLW link

Ba­sic Facts about Lan­guage Model Internals

4 Jan 2023 13:01 UTC
130 points
19 comments9 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

15 Feb 2023 18:11 UTC
100 points
27 comments4 min readLW link

FLI Pod­cast: Con­nor Leahy on AI Progress, Chimps, Memes, and Mar­kets (Part 1/​3)

10 Feb 2023 13:55 UTC
39 points
0 comments43 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimi20 Aug 2022 12:56 UTC
24 points
2 comments2 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimi13 Aug 2022 10:01 UTC
33 points
2 comments2 min readLW link

Ab­stract­ing The Hard­ness of Align­ment: Un­bounded Atomic Optimization

adamShimi29 Jul 2022 18:59 UTC
72 points
3 comments16 min readLW link

Levels of Pluralism

adamShimi27 Jul 2022 9:35 UTC
37 points
0 comments14 min readLW link

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimi23 Jul 2022 11:40 UTC
38 points
5 comments3 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimi20 Jul 2022 10:44 UTC
87 points
11 comments8 min readLW link

Mo­saic and Pal­impsests: Two Shapes of Research

adamShimi12 Jul 2022 9:05 UTC
39 points
3 comments9 min readLW link

Refine: An In­cu­ba­tor for Con­cep­tual Align­ment Re­search Bets

adamShimi15 Apr 2022 8:57 UTC
144 points
13 comments4 min readLW link

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC
114 points
15 comments33 min readLW link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

29 Jul 2022 19:07 UTC
131 points
6 comments19 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Akash15 Sep 2022 18:37 UTC
107 points
23 comments15 min readLW link

Method­olog­i­cal Ther­apy: An Agenda For Tack­ling Re­search Bottlenecks

22 Sep 2022 18:41 UTC
54 points
6 comments9 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

23 Sep 2022 17:58 UTC
144 points
29 comments33 min readLW link

Mys­ter­ies of mode collapse

janus8 Nov 2022 10:37 UTC
284 points
57 comments14 min readLW link1 review

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

16 Nov 2022 14:14 UTC
89 points
2 comments12 min readLW link

AGI will have learnt util­ity functions

beren25 Jan 2023 19:42 UTC
36 points
4 comments13 min readLW link

Con­jec­ture Se­cond Hiring Round

23 Nov 2022 17:11 UTC
92 points
0 comments1 min readLW link

Gra­di­ent hack­ing is ex­tremely difficult

beren24 Jan 2023 15:45 UTC
162 points
22 comments5 min readLW link

Why al­most ev­ery RL agent does learned optimization

Lee Sharkey12 Feb 2023 4:58 UTC
32 points
3 comments5 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

13 Dec 2022 15:41 UTC
149 points
23 comments22 min readLW link2 reviews

Ba­sic facts about lan­guage mod­els dur­ing training

beren21 Feb 2023 11:46 UTC
98 points
15 comments18 min readLW link

Ja­pan AI Align­ment Con­fer­ence Postmortem

20 Apr 2023 10:58 UTC
71 points
8 comments8 min readLW link

Con­jec­ture: A Roadmap for Cog­ni­tive Soft­ware and A Hu­man­ist Fu­ture of AI

2 Dec 2024 13:28 UTC
44 points
10 comments29 min readLW link
(www.conjecture.dev)

A tech­ni­cal note on bil­in­ear lay­ers for interpretability

Lee Sharkey8 May 2023 6:06 UTC
59 points
0 comments1 min readLW link
(arxiv.org)

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

1 May 2023 16:47 UTC
96 points
10 comments30 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC
7 points
0 comments4 min readLW link

Launch­ing Ap­pli­ca­tions for the Global AI Safety Fel­low­ship 2025!

Aditya_SK30 Nov 2024 14:02 UTC
11 points
5 comments1 min readLW link

A cou­ple of ques­tions about Con­jec­ture’s Cog­ni­tive Emu­la­tion proposal

Igor Ivanov11 Apr 2023 14:05 UTC
30 points
1 comment3 min readLW link

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Connor Leahy29 Aug 2023 10:56 UTC
63 points
13 comments1 min readLW link
(www.youtube.com)

Con­jec­ture: A stand­ing offer for pub­lic de­bates on AI

Andrea_Miotti16 Jun 2023 14:33 UTC
29 points
1 comment2 min readLW link
(www.conjecture.dev)

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC
75 points
12 comments16 min readLW link
No comments.