Connor Leahy

Karma: 1,783

CEO at Conjecture.

I don’t know how to save the world, but dammit I’m gonna try.

Barriers to Mechanistic Interpretability for AGI Safety

Connor LeahyAug 29, 2023, 10:56 AM

63 points

13 comments1 min readLW link

(www.youtube.com)

Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes

OliviaJ, Rohin Shah, Connor Leahy and Andrea_Miotti

May 1, 2023, 4:47 PM

96 points

10 comments30 min readLW link

Connor Leahy May 1, 2023, 8:24 AM
5 points
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Looks good to me, thank you Loppukilpailija!

Connor Leahy Apr 18, 2023, 9:59 AM
3 points
0
in reply to: André Ferretti’s comment on: Navigating the Open-Source AI Landscape: Data, Funding, and Safety
Thanks!

Connor Leahy Apr 15, 2023, 1:21 PM
8 points
3
on: Navigating the Open-Source AI Landscape: Data, Funding, and Safety
As I have said many, many times before, Conjecture is not a deep shift in my beliefs about open sourcing, as it is not, and has never been, the position of EleutherAI (at least while I was head) that everything should be released in all scenarios, but rather that some specific things (such as LLMs of the size and strength we release) should be released in some specific situations for some specific reasons. EleutherAI would not, and has not, released models or capabilities that would push the capabilities frontier (and while I am no longer in charge, I strongly expect that legacy to continue), and there are a number of things we did discover that we decided to delay or not release at all for precisely such infohazard reasons. Conjecture of course is even stricter and has opsec that wouldn’t be possible at a volunteer driven open source community.
Additionally, Carper is not part of EleutherAI and should be considered genealogically descendant but independent of EAI.

Connor Leahy Mar 10, 2023, 7:19 AM
13 points
2
on: Questions about Conjecure’s CoEm proposal
Thanks for this! These are great questions! We have been collecting questions from the community and plan to write a follow up post addressing them in the next couple of weeks.
What links here?
- Vladimir_Nesov's comment on Scaffolded LLMs: Less Obvious Concerns by Stephen Fowler (Jun 17, 2023, 1:29 PM; 3 points)

Cognitive Emulation: A Naive AI Safety Proposal

Connor Leahy and Gabriel Alfour

Feb 25, 2023, 7:35 PM

195 points

46 comments4 min readLW link

Conjecture Second Hiring Round

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

Nov 23, 2022, 5:11 PM

92 points

0 comments1 min readLW link

Conjecture: a retrospective after 8 months of work

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

Nov 23, 2022, 5:10 PM

180 points

9 comments8 min readLW link

Interpreting Neural Networks through the Polytope Lens

Sid Black, Lee Sharkey, Connor Leahy, beren, CRG, merizian, Eric Winsor and Dan Braun

Sep 23, 2022, 5:58 PM

144 points

29 comments33 min readLW link

Conjecture: Internal Infohazard Policy

Connor Leahy, Sid Black, Chris Scammell and Andrea_Miotti

Jul 29, 2022, 7:07 PM

131 points

6 comments19 min readLW link

Connor Leahy Jun 15, 2022, 4:24 PM
LW: 20 AF: 10
12
AF
on: Godzilla Strategies
I initially liked this post a lot, then saw a lot of pushback in the comments, mostly of the (very valid!) form of “we actually build reliable things out of unreliable things, particularly with computers, all the time”. I think this is a fair criticism of the post (and choice of examples/metaphors therein), but I think it may be missing (one of) the core message(s) trying to be delivered.
I wanna give an interpretation/steelman of what I think John is trying to convey here (which I don’t know whether he would endorse or not):
“There are important assumptions that need to be made for the usual kind of systems security design to work (e.g. uncorrelation of failures). Some of these assumptions will (likely) not apply with AGI. Therefor, extrapolating this kind of thinking to this domain is Bad™️.” (“Epistemological vigilance is critical”)
So maybe rather than saying “trying to build robust things out of brittle things is a bad idea”, it’s more like “we can build robust things out of certain brittle things, e.g. computers, but Godzilla is not a computer, and so you should only extrapolate from computers to Godzilla if you’re really, really sure you know what you’re doing.”

Connor Leahy Apr 10, 2022, 5:00 PM
1 point
in reply to: Alexander Mont’s comment on: AMA Conjecture, A New Alignment Startup
I think this is something better discussed in private. Could you DM me? Thanks!

Connor Leahy Apr 10, 2022, 4:59 PM
5 points
in reply to: JenniferRM’s comment on: AMA Conjecture, A New Alignment Startup
This is a genuinely difficult and interesting question that I want to provide a good answer for, but that might take me some time to write up, I’ll get back to you at a later date.

Connor Leahy Apr 10, 2022, 4:55 PM
LW: 6 AF: 2
AF
in reply to: Charlie Steiner’s comment on: AMA Conjecture, A New Alignment Startup
Yes, we do expect this to be the case. Unfortunately, I think explaining in detail why we think this may be infohazardous. Or at least, I am sufficiently unsure about how infohazardous it is that I would first like to think about it for longer and run it through our internal infohazard review before sharing more. Sorry!

Connor Leahy Apr 10, 2022, 4:54 PM
1 point
in reply to: Chris_Leong’s comment on: We Are Conjecture, A New Alignment Research Startup
Answered here.

Connor Leahy Apr 10, 2022, 4:53 PM
LW: 11 AF: 4
AF
in reply to: Ethan Perez’s comment on: AMA Conjecture, A New Alignment Startup
Redwood is doing great research, and we are fairly aligned with their approach. In particular, we agree that hands-on experience building alignment approaches could have high impact, even if AGI ends up having an architecture unlike modern neural networks (which we don’t believe will be the case). While Conjecture and Redwood both have a strong focus on prosaic alignment with modern ML models, our research agenda has higher variance, in that we additionally focus on conceptual and meta-level research. We’re also training our own (large) models, but (we believe) Redwood are just using pretrained, publicly available models. We do this for three reasons:
1. Having total control over the models we use can give us more insights into the phenomena we study, such as training models at a range of sizes to study scaling properties of alignment techniques.
2. Some properties we want to study may only appear in close-to-SOTA models—most of which are private.
3. We are trying to make products, and close-to-SOTA models help us do that better. Though as we note in our post, we plan to avoid work that pushes the capabilities frontier.
We’re also for-profit, while Redwood is a nonprofit, and we’re located in London! Not everyone lives out in the Bay :)
What links here?
- Thomas Larsen's comment on Connor Leahy on Dying with Dignity, EleutherAI and Conjecture by Michaël Trazzi (Jul 22, 2022, 8:52 PM; 6 points)
- Connor Leahy's comment on We Are Conjecture, A New Alignment Research Startup by Connor Leahy (Apr 10, 2022, 4:54 PM; 1 point)

Connor Leahy Apr 10, 2022, 4:47 PM
LW: 13 AF: 4
AF
in reply to: Chinese Room’s comment on: AMA Conjecture, A New Alignment Startup
For the record, having any person or organization in this position would be a tremendous win. Interpretable aligned AGI?! We are talking about a top .1% scenario here! Like, the difference between egoistical Connor vs altruistic Connor with an aligned AGI in his hands is much much smaller than Connor with an aligned AGI and anyone, any organization or any scenario, with a misaligned AGI.
But let’s assume this.
Unfortunately, there is no actual functioning reliable mechanism by which humans can guarantee their alignment to each other. If there was something I could do that would irreversibly bind me to my commitment to the best interests of mankind in a publicly verifiable way, I would do it in a heartbeat. But there isn’t and most attempts at such are security theater.
What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.
On a meta-level, I think the best guarantee I can give is simply that not acting in humanity’s best interest is, in my model, Stupid. And my personal guiding philosophy in life is “Don’t Be Stupid”. Human values are complex and fragile, and while many humans disagree about many details of how they think the world should be, there are many core values that we all share, and not fighting with everything we’ve got to protect these values (or dying with dignity in the process) is Stupid.

Connor Leahy Apr 10, 2022, 4:41 PM
4 points
in reply to: Evan R. Murphy’s comment on: AMA Conjecture, A New Alignment Startup
Probably. It is likely that we will publish a lot of our interpretability work and tools, but we can’t commit to that because, unlike some others, we think it’s almost guaranteed that some interpretability work will lead to very infohazardous outcomes. For example, obvious ways in which architectures could be trained more efficiently, and as such we need to consider each result on a case by case basis. However, if we deem them safe, we would definitely like to share as many of our tools and insights as possible.

Connor Leahy Apr 10, 2022, 4:39 PM
2 points
in reply to: Daniel Paleka’s comment on: AMA Conjecture, A New Alignment Startup
We would love to collaborate with anyone (from academia or elsewhere) wherever it makes sense to do so, but we honestly just do not care very much about formal academic publication or citation metrics or whatever. If we see opportunities to collaborate with academia that we think will lead to interesting alignment work getting done, excellent!

Connor Leahy

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

Con­jec­ture Se­cond Hiring Round

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy