Conjecture (org)

TagLast edit: Dec 30, 2024, 9:24 AM by Dakara

Conjecture is an alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research.

The initial directions of their research agenda include:

New frames for reasoning about large language models
Scalable mechanistic interpretability
History and philosophy of alignment

We Are Conjecture, A New Alignment Research Startup

Connor LeahyApr 8, 2022, 11:40 AM

197 points

25 comments4 min readLW link

Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël TrazziJul 22, 2022, 6:44 PM

195 points

29 comments14 min readLW link

(theinsideview.ai)

Epistemological Vigilance for Alignment

adamShimiJun 6, 2022, 12:27 AM

66 points

11 comments10 min readLW link

Questions about Conjecure’s CoEm proposal

Orpheus16 and NicholasKees

Mar 9, 2023, 7:32 PM

51 points

4 comments2 min readLW link

Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI

Maris SalaMay 22, 2023, 2:31 PM

155 points

5 comments3 min readLW link

(www.conjecture.dev)

Cognitive Emulation: A Naive AI Safety Proposal

Connor Leahy and Gabriel Alfour

Feb 25, 2023, 7:35 PM

195 points

46 comments4 min readLW link

Conjecture: a retrospective after 8 months of work

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

Nov 23, 2022, 5:10 PM

180 points

9 comments8 min readLW link

Re-Examining LayerNorm

Eric WinsorDec 1, 2022, 10:20 PM

127 points

12 comments5 min readLW link

Refine’s First Blog Post Day

adamShimiAug 13, 2022, 10:23 AM

55 points

3 comments1 min readLW link

Human decision processes are not well factored

remember and Gabriel Alfour

Feb 17, 2023, 1:11 PM

33 points

3 comments2 min readLW link

Searching for Search

NicholasKees and janus

Nov 28, 2022, 3:31 PM

97 points

9 comments14 min readLW link 1 review

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Andrea_Miotti, paulfchristiano, Gabriel Alfour and OliviaJ

Feb 24, 2023, 11:03 PM

61 points

7 comments47 min readLW link

AGI in sight: our look at the game board

Andrea_Miotti and Gabriel Alfour

Feb 18, 2023, 10:17 PM

227 points

135 comments6 min readLW link

(andreamiotti.substack.com)

Empathy as a natural consequence of learnt reward models

berenFeb 4, 2023, 3:35 PM

48 points

27 comments13 min readLW link

AMA Conjecture, A New Alignment Startup

adamShimiApr 9, 2022, 9:43 AM

47 points

42 comments1 min readLW link

Refine’s Second Blog Post Day

adamShimiAug 20, 2022, 1:01 PM

19 points

0 comments1 min readLW link

Japan AI Alignment Conference

Chris Scammell and Katrina Joslin

Mar 10, 2023, 6:56 AM

64 points

7 comments1 min readLW link

(www.conjecture.dev)

Critiques of prominent AI safety labs: Conjecture

Omega.Jun 12, 2023, 1:32 AM

12 points

32 comments33 min readLW link

What I Learned Running Refine

adamShimiNov 24, 2022, 2:49 PM

108 points

5 comments4 min readLW link

The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren and Sid Black

Nov 28, 2022, 12:54 PM

199 points

33 comments31 min readLW link

The First Filter

adamShimi and Gabriel Alfour

Nov 26, 2022, 7:37 PM

67 points

5 comments1 min readLW link

Biases are engines of cognition

remember and Gabriel Alfour

Nov 30, 2022, 4:47 PM

46 points

7 comments1 min readLW link

Tradeoffs in complexity, abstraction, and generality

remember and Gabriel Alfour

Dec 12, 2022, 3:55 PM

32 points

0 comments2 min readLW link

Psychological Disorders and Problems

adamShimi and Gabriel Alfour

Dec 12, 2022, 6:15 PM

39 points

6 comments1 min readLW link

Mental acceptance and reflection

remember and Gabriel Alfour

Dec 22, 2022, 2:32 PM

34 points

1 comment2 min readLW link

Basic Facts about Language Model Internals

beren and Eric Winsor

Jan 4, 2023, 1:01 PM

130 points

19 comments9 min readLW link

Don’t accelerate problems you’re trying to solve

Andrea_Miotti and remember

Feb 15, 2023, 6:11 PM

100 points

27 comments4 min readLW link

FLI Podcast: Connor Leahy on AI Progress, Chimps, Memes, and Markets (Part 1/3)

remember and Andrea_Miotti

Feb 10, 2023, 1:55 PM

39 points

0 comments43 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimiAug 20, 2022, 12:56 PM

24 points

2 comments2 min readLW link

Shapes of Mind and Pluralism in Alignment

adamShimiAug 13, 2022, 10:01 AM

33 points

2 comments2 min readLW link

Abstracting The Hardness of Alignment: Unbounded Atomic Optimization

adamShimiJul 29, 2022, 6:59 PM

72 points

3 comments16 min readLW link

Levels of Pluralism

adamShimiJul 27, 2022, 9:35 AM

37 points

0 comments14 min readLW link

Robustness to Scaling Down: More Important Than I Thought

adamShimiJul 23, 2022, 11:40 AM

38 points

5 comments3 min readLW link

How to Diversify Conceptual Alignment: the Model Behind Refine

adamShimiJul 20, 2022, 10:44 AM

87 points

11 comments8 min readLW link

Mosaic and Palimpsests: Two Shapes of Research

adamShimiJul 12, 2022, 9:05 AM

39 points

3 comments9 min readLW link

Refine: An Incubator for Conceptual Alignment Research Bets

adamShimiApr 15, 2022, 8:57 AM

144 points

13 comments4 min readLW link

Circumventing interpretability: How to defeat mind-readers

Lee SharkeyJul 14, 2022, 4:59 PM

114 points

15 comments33 min readLW link

Conjecture: Internal Infohazard Policy

Connor Leahy, Sid Black, Chris Scammell and Andrea_Miotti

Jul 29, 2022, 7:07 PM

131 points

6 comments19 min readLW link

Understanding Conjecture: Notes from Connor Leahy interview

Orpheus16Sep 15, 2022, 6:37 PM

107 points

23 comments15 min readLW link

Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi, Lucas Teixeira and remember

Sep 22, 2022, 6:41 PM

54 points

6 comments9 min readLW link

Interpreting Neural Networks through the Polytope Lens

Sid Black, Lee Sharkey, Connor Leahy, beren, CRG, merizian, Eric Winsor and Dan Braun

Sep 23, 2022, 5:58 PM

144 points

29 comments33 min readLW link

Mysteries of mode collapse

janusNov 8, 2022, 10:37 AM

284 points

57 comments14 min readLW link 1 review

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

Nov 16, 2022, 2:14 PM

89 points

2 comments12 min readLW link

AGI will have learnt utility functions

berenJan 25, 2023, 7:42 PM

36 points

4 comments13 min readLW link

Conjecture Second Hiring Round

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

Nov 23, 2022, 5:11 PM

92 points

0 comments1 min readLW link

Gradient hacking is extremely difficult

berenJan 24, 2023, 3:45 PM

164 points

22 comments5 min readLW link

Why almost every RL agent does learned optimization

Lee SharkeyFeb 12, 2023, 4:58 AM

32 points

3 comments5 min readLW link

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

Dec 13, 2022, 3:41 PM

150 points

23 comments22 min readLW link 2 reviews

Basic facts about language models during training

berenFeb 21, 2023, 11:46 AM

98 points

15 comments18 min readLW link

Japan AI Alignment Conference Postmortem

Chris Scammell and Katrina Joslin

Apr 20, 2023, 10:58 AM

71 points

8 comments8 min readLW link

Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI

Connor Leahy and Gabriel Alfour

Dec 2, 2024, 1:28 PM

44 points

10 comments29 min readLW link

(www.conjecture.dev)

A technical note on bilinear layers for interpretability

Lee SharkeyMay 8, 2023, 6:06 AM

59 points

0 comments1 min readLW link

(arxiv.org)

Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes

OliviaJ, Rohin Shah, Connor Leahy and Andrea_Miotti

May 1, 2023, 4:47 PM

96 points

10 comments30 min readLW link

A response to Conjecture’s CoEm proposal

Kristian FreedApr 24, 2023, 5:23 PM

7 points

0 comments4 min readLW link

Launching Applications for the Global AI Safety Fellowship 2025!

Aditya_SKNov 30, 2024, 2:02 PM

11 points

5 comments1 min readLW link

A couple of questions about Conjecture’s Cognitive Emulation proposal

Igor IvanovApr 11, 2023, 2:05 PM

30 points

1 comment3 min readLW link

Barriers to Mechanistic Interpretability for AGI Safety

Connor LeahyAug 29, 2023, 10:56 AM

63 points

13 comments1 min readLW link

(www.youtube.com)

Conjecture: A standing offer for public debates on AI

Andrea_MiottiJun 16, 2023, 2:33 PM

29 points

1 comment2 min readLW link

(www.conjecture.dev)

My guess at Conjecture’s vision: triggering a narrative bifurcation

Alexandre VariengienFeb 6, 2024, 7:10 PM

75 points

12 comments16 min readLW link

No comments.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer