Conjecture (org)

TagLast edit: 30 Dec 2024 9:24 UTC by Dakara

Conjecture is an alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research.

The initial directions of their research agenda include:

New frames for reasoning about large language models
Scalable mechanistic interpretability
History and philosophy of alignment

We Are Conjecture, A New Alignment Research Startup

Connor Leahy8 Apr 2022 11:40 UTC

197 points

25 comments4 min readLW link

Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi22 Jul 2022 18:44 UTC

195 points

29 comments14 min readLW link

(theinsideview.ai)

Epistemological Vigilance for Alignment

adamShimi6 Jun 2022 0:27 UTC

66 points

11 comments10 min readLW link

Questions about Conjecure’s CoEm proposal

Orpheus16 and NicholasKees

9 Mar 2023 19:32 UTC

51 points

4 comments2 min readLW link

Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI

Maris Sala22 May 2023 14:31 UTC

155 points

5 comments3 min readLW link

(www.conjecture.dev)

Human decision processes are not well factored

remember and Gabriel Alfour

17 Feb 2023 13:11 UTC

33 points

3 comments2 min readLW link

Searching for Search

NicholasKees and janus

28 Nov 2022 15:31 UTC

97 points

9 comments14 min readLW link 1 review

Conjecture: a retrospective after 8 months of work

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

23 Nov 2022 17:10 UTC

180 points

9 comments8 min readLW link

Re-Examining LayerNorm

Eric Winsor1 Dec 2022 22:20 UTC

127 points

12 comments5 min readLW link

Cognitive Emulation: A Naive AI Safety Proposal

Connor Leahy and Gabriel Alfour

25 Feb 2023 19:35 UTC

195 points

46 comments4 min readLW link

Refine’s First Blog Post Day

adamShimi13 Aug 2022 10:23 UTC

55 points

3 comments1 min readLW link

Japan AI Alignment Conference

Chris Scammell and Katrina Joslin

10 Mar 2023 6:56 UTC

64 points

7 comments1 min readLW link

(www.conjecture.dev)

The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren and Sid Black

28 Nov 2022 12:54 UTC

200 points

34 comments31 min readLW link

Empathy as a natural consequence of learnt reward models

beren4 Feb 2023 15:35 UTC

48 points

27 comments13 min readLW link

Refine’s Second Blog Post Day

adamShimi20 Aug 2022 13:01 UTC

19 points

0 comments1 min readLW link

AGI in sight: our look at the game board

Andrea_Miotti and Gabriel Alfour

18 Feb 2023 22:17 UTC

228 points

135 comments6 min readLW link

(andreamiotti.substack.com)

What I Learned Running Refine

adamShimi24 Nov 2022 14:49 UTC

108 points

5 comments4 min readLW link

AMA Conjecture, A New Alignment Startup

adamShimi9 Apr 2022 9:43 UTC

47 points

42 comments1 min readLW link

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Andrea_Miotti, paulfchristiano, Gabriel Alfour and Olive Branch

24 Feb 2023 23:03 UTC

61 points

7 comments47 min readLW link

Critiques of prominent AI safety labs: Conjecture

Omega.12 Jun 2023 1:32 UTC

12 points

32 comments33 min readLW link

Mysteries of mode collapse

janus8 Nov 2022 10:37 UTC

288 points

57 comments14 min readLW link 1 review

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

16 Nov 2022 14:14 UTC

89 points

2 comments12 min readLW link

Abstracting The Hardness of Alignment: Unbounded Atomic Optimization

adamShimi29 Jul 2022 18:59 UTC

75 points

3 comments16 min readLW link

The First Filter

adamShimi and Gabriel Alfour

26 Nov 2022 19:37 UTC

67 points

5 comments1 min readLW link

Tradeoffs in complexity, abstraction, and generality

remember and Gabriel Alfour

12 Dec 2022 15:55 UTC

32 points

0 comments2 min readLW link

Mosaic and Palimpsests: Two Shapes of Research

adamShimi12 Jul 2022 9:05 UTC

39 points

3 comments9 min readLW link

AGI will have learnt utility functions

beren25 Jan 2023 19:42 UTC

40 points

4 comments13 min readLW link

Conjecture Second Hiring Round

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

23 Nov 2022 17:11 UTC

92 points

0 comments1 min readLW link

Circumventing interpretability: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC

118 points

15 comments33 min readLW link

Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi, Lucas Teixeira and remember

22 Sep 2022 18:41 UTC

54 points

6 comments9 min readLW link

Don’t accelerate problems you’re trying to solve

Andrea_Miotti and remember

15 Feb 2023 18:11 UTC

100 points

27 comments4 min readLW link

Interpreting Neural Networks through the Polytope Lens

Sid Black, Lee Sharkey, Connor Leahy, beren, CRG, merizian, Eric Winsor and Dan Braun

23 Sep 2022 17:58 UTC

144 points

29 comments33 min readLW link

Robustness to Scaling Down: More Important Than I Thought

adamShimi23 Jul 2022 11:40 UTC

38 points

5 comments3 min readLW link

Conjecture: Internal Infohazard Policy

Connor Leahy, Sid Black, Chris Scammell and Andrea_Miotti

29 Jul 2022 19:07 UTC

131 points

6 comments19 min readLW link

Understanding Conjecture: Notes from Connor Leahy interview

Orpheus1615 Sep 2022 18:37 UTC

107 points

23 comments15 min readLW link

Shapes of Mind and Pluralism in Alignment

adamShimi13 Aug 2022 10:01 UTC

33 points

2 comments2 min readLW link

Levels of Pluralism

adamShimi27 Jul 2022 9:35 UTC

37 points

0 comments14 min readLW link

Basic Facts about Language Model Internals

beren and Eric Winsor

4 Jan 2023 13:01 UTC

130 points

19 comments9 min readLW link

How to Diversify Conceptual Alignment: the Model Behind Refine

adamShimi20 Jul 2022 10:44 UTC

87 points

11 comments8 min readLW link

Psychological Disorders and Problems

adamShimi and Gabriel Alfour

12 Dec 2022 18:15 UTC

39 points

6 comments1 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimi20 Aug 2022 12:56 UTC

24 points

2 comments2 min readLW link

Mental acceptance and reflection

remember and Gabriel Alfour

22 Dec 2022 14:32 UTC

34 points

1 comment2 min readLW link

Refine: An Incubator for Conceptual Alignment Research Bets

adamShimi15 Apr 2022 8:57 UTC

144 points

13 comments4 min readLW link

FLI Podcast: Connor Leahy on AI Progress, Chimps, Memes, and Markets (Part 1/3)

remember and Andrea_Miotti

10 Feb 2023 13:55 UTC

39 points

0 comments43 min readLW link

Biases are engines of cognition

remember and Gabriel Alfour

30 Nov 2022 16:47 UTC

46 points

7 comments1 min readLW link

Gradient hacking is extremely difficult

beren24 Jan 2023 15:45 UTC

174 points

23 comments5 min readLW link

Why almost every RL agent does learned optimization

Lee Sharkey12 Feb 2023 4:58 UTC

32 points

3 comments5 min readLW link

Basic facts about language models during training

beren21 Feb 2023 11:46 UTC

99 points

15 comments18 min readLW link

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

13 Dec 2022 15:41 UTC

155 points

23 comments22 min readLW link 2 reviews

Japan AI Alignment Conference Postmortem

Chris Scammell and Katrina Joslin

20 Apr 2023 10:58 UTC

71 points

8 comments8 min readLW link

Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI

Connor Leahy and Gabriel Alfour

2 Dec 2024 13:28 UTC

50 points

10 comments29 min readLW link

(www.conjecture.dev)

Launching Applications for the Global AI Safety Fellowship 2025!

Aditya_SK30 Nov 2024 14:02 UTC

11 points

5 comments1 min readLW link

A technical note on bilinear layers for interpretability

Lee Sharkey8 May 2023 6:06 UTC

59 points

0 comments1 min readLW link

(arxiv.org)

A response to Conjecture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC

7 points

0 comments4 min readLW link

A couple of questions about Conjecture’s Cognitive Emulation proposal

Igor Ivanov11 Apr 2023 14:05 UTC

30 points

1 comment3 min readLW link

Conjecture: A standing offer for public debates on AI

Andrea_Miotti16 Jun 2023 14:33 UTC

29 points

1 comment2 min readLW link

(www.conjecture.dev)

My guess at Conjecture’s vision: triggering a narrative bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC

75 points

12 comments16 min readLW link

Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes

Olive Branch, Rohin Shah, Connor Leahy and Andrea_Miotti

1 May 2023 16:47 UTC

96 points

10 comments30 min readLW link

Barriers to Mechanistic Interpretability for AGI Safety

Connor Leahy29 Aug 2023 10:56 UTC

63 points

13 comments1 min readLW link

(www.youtube.com)

No comments.

Con­jec­ture (org)

Conjecture (org)