Cross-posted from the EA Forum. See the original here. Internal linking has not been updated for LW due to time constraints and will take you back to the original post.

In this series, we consider AI safety organizations that have received more than $10 million per year in funding. There have already been several conversations and critiques around MIRI (1) and OpenAI (1,2,3), so we will not be covering them. The authors include one technical AI safety researcher (>4 years experience), and one non-technical community member with experience in the EA community. We’d like to make our critiques non-anonymously but believe this will not be a wise move professionally speaking. We believe our criticisms stand on their own without appeal to our positions. Readers should not assume that we are completely unbiased or don’t have anything to personally or professionally gain from publishing these critiques. We’ve tried to take the benefits and drawbacks of the anonymous nature of our post seriously and carefully, and are open to feedback on anything we might have done better.

This is the second post in this series and it covers Conjecture. Conjecture is a for-profit alignment startup founded in late 2021 by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale applied alignment research. Based in London, Conjecture has received $10 million in funding from venture capitalists (VCs), and recruits heavily from the EA movement. We shared a draft of this document with Conjecture for feedback prior to publication, and include their response below. We also requested feedback on a draft from a small group of experienced alignment researchers from various organizations, and have invited them to share their views in the comments of this post.

We would like to invite others to share their thoughts in the comments openly if you feel comfortable, or contribute anonymously via this form. We will add inputs from there to the comments section of this post, but will likely not be updating the main body of the post as a result (unless comments catch errors in our writing).

Key Takeaways

For those with limited knowledge and context on Conjecture, we recommend first reading or skimming the About Conjecture section.

Time to read the core sections (Criticisms & Suggestions and Our views on Conjecture) is 22 minutes.

Criticisms and Suggestions

We think Conjecture’s research is low quality (read more).
- Their posts don’t always make assumptions clear, don’t make it clear what evidence base they have for a given hypothesis, and evidence is frequently cherry-picked. We also think their bar for publishing is too low, which increases the signal to noise ratio. Conjecture has acknowledged some of these criticisms, but not all (read more).
- We make specific critiques of examples of their research from their initial research agenda (read more).
- There is limited information available on their new research direction (cognitive emulation), but from the publicly available information it appears extremely challenging and so we are skeptical as to its tractability (read more).
We have some concerns with the CEO’s character and trustworthiness because, in order of importance (read more):
- The CEO and Conjecture have misrepresented themselves to external parties multiple times (read more);
- The CEO’s involvement in EleutherAI and Stability AI has contributed to race dynamics (read more);
- The CEO previously overstated his accomplishments in 2019 (when an undergrad) (read more);
- The CEO has been inconsistent over time regarding his position on releasing LLMs (read more).
We believe Conjecture has scaled too quickly before demonstrating they have promising research results, and believe this will make it harder for them to pivot in the future (read more).
We are concerned that Conjecture does not have a clear plan for balancing profit and safety motives (read more).
Conjecture has had limited meaningful engagement with external actors (read more):
- Conjecture lacks productive communication with external actors within the TAIS community, often reacting defensively to negative feedback and failing to address core points (read more);
- Conjecture has not engaged sufficiently with the broader ML community, we think they would receive valuable feedback by engaging more. We’ve written more about this previously (read more).

Our views on Conjecture

We would advise against working at Conjecture, given their history of low quality research, concerns with the CEO’s character and trustworthiness and the leadership team’s lack of research experience (read more).
We would advise Conjecture to avoid unilateral engagement with important stakeholders and strive to represent their place in the TAIS ecosystem accurately because they have misrepresented themselves multiple times (read more).
We do not think that Conjecture should receive additional funding before addressing key concerns because of the reasons cited above (read more).
We encourage the TAIS and EA community members and organizations reflect to what extent they want to legitimize Conjecture until Conjecture addresses these concerns (read more).

About Conjecture

Funding

Conjecture received (primarily via commercial investment) roughly $10 million in 2022. According to them, they’ve received VC backing from Nat Friedman (ex-CEO of GitHub), Patrick and John Collison (co-founders of Stripe), Daniel Gross (investor and cofounder of a startup accelerator), Andrej Karpathy (ex-OpenAI), Sam Bankman-Fried, Arthur Breitman and others. We are not aware of any later funding rounds, but it’s possible they have raised more since then.

Outputs

Products

Verbalize is an automatic transcription model. This is a B2C SaaS product and was released in early 2023. Our impression is that it’s easy to use but no more powerful than existing open-source models like Whisper, although we are not aware of any detailed empirical evaluation. We do not think the product has seen commercial success yet, as it was released recently. Our estimate is that about one third of Conjecture’s team are actively working on developing products.

Alignment Research

Conjecture studies large language models (LLMs), with a focus on empirical and conceptual work. Mechanistic interpretability was a particular focus, with output such as the polytope lens, sparse autoencoders and analyzing the SVD of weight matrices, as well as work more broadly seeking to better understand LLMs, such as simulator theory.

They have recently pivoted away from this agenda towards cognitive emulation, which is reminiscent of process-based supervision. Here is a link to their full research agenda and publication list. Due to their infohazard policy (see below), some of their research may not have been publicly released.

Infohazard policy

Conjecture developed an infohazard policy in their first few months and shared it publicly to encourage other organizations to publish or adopt similar policies. They say that while many actors were “verbally supportive of the policy, no other organization has publicly committed to a similar policy”.

Governance outreach

We understand that CEO Connor Leahy does a lot of outreach to policymakers in the UK, and capabilities researchers at other prominent AI companies. He’s also appeared on several podcasts (1, FLI (1,2,3,4), 3, 4, 5) and been interviewed by several journalists (1, 2, 3, 4, 5, 6, 7, 8).

Incubator Program

Adam Shimi ran an incubator called Refine in 2022, whose purpose was to create new independent conceptual researchers and help them build original research agendas. Based on Adam’s retrospective, it seems like this project wasn’t successful at achieving its goals and Adam is now pursuing different projects.

Team

The Conjecture team started as a team of 4 employees in late 2021 and have grown to at least 22 employees now (according to their LinkedIn), with most employees joining in 2022.

Their CEO, Connor Leahy, has a technical background (with 2 years of professional machine learning experience and a Computer Science undergrad) and partially replicated GPT-2 in 2019 (discussed in more detail below). Their Chief of Staff, has experience with staffing and building team culture from her time at McKinsey, and has similar experience at Meta. Their co-founder Gabriel Alfour has the most relevant technical and scaling experience as the CEO of Marigold,^[1] a firm performing core development on the Tezos cryptocurrency infrastructure with over 30 staff members.

Two individuals collectively publishing under the pseudonym janus published simulator theory, one of Conjecture’s outputs that we understand the TAIS community to have been most favorable towards. They left Conjecture in late 2022. More recently, many researchers working on mechanistic interpretability left the team after Conjecture’s pivot towards cognitive emulation. Those departing include Lee Sharkey, the lead author on the sparse autoencoders post and a contributor to the polytope lens post.

Conjecture in the TAIS ecosystem

Conjecture staff are frequent contributors on the Alignment Forum and recruit heavily from the EA movement. Their CEO has appeared on a few EA podcasts (including several times on the FLI podcast). Some TAIS researchers are positive about their work. They fiscally sponsor two TAIS field-building programs, MATS and ARENA, in London (where they are based).

Their team also spent a month in the Bay Area in 2022 (when many TAIS researchers were visiting through programs like MLAB, SERI MATS and on independent grants). Conjecture made an effort to build relationships with researchers, decisionmakers and grantmakers, and were actively fundraising from EA funders during this period. 3-4 Conjecture staff regularly worked out of the Lightcone Offices, with a peak of ~11 staff on a single day. The largest event run by Conjecture was an EA Global afterparty hosted at a Lightcone venue, with a couple hundred attendees, predominantly TAIS researchers.

Criticisms and Suggestions

Low quality research

General thoughts on Conjecture’s research

We believe most of Conjecture’s publicly available research to date is low-quality compared to the average ML conference paper. A direct comparison is difficult as some Conjecture members have prioritized releasing small, regular updates; however our impression is that even combining these they would at best meet the much lower bar of a workshop paper.

As we discuss below, Conjecture does not present their research findings in a systematic way that would make it accessible for others to review and critique. Conjecture’s work often consists of isolated observations that are not built upon or adequately tested in other settings. We recommend Conjecture focus more on developing empirically testable theories, and also suggest they introduce an internal peer-review process to evaluate the rigor of work prior to publicly disseminating their results. Conjecture might also benefit from having researchers and reviewers work through (although not rigidly stick to) the Machine Learning Reproducibility Checklist.

These limitations may in part be because Conjecture is a young organization with a relatively inexperienced research team, a point they have readily acknowledged in retrospectives and when criticized on research quality. However, taking their youth and inexperience into account, we still think their research is below the bar for funding or other significant support. When we take into account the funding that Conjecture has (at least $10M raised in their last round), we think they are significantly underperforming standard academic research labs (see our discussion on this in the Redwood post; we are significantly more excited about Redwood’s research than Conjecture). We believe they could significantly improve their research output by seeking out mentorship from more experienced ML or alignment researchers, and recommend they do this in the future.

Initial research agenda (March 2022 - Nov 2022)

Conjecture’s initial research agenda focused on interpretability, conceptual alignment and epistemology. Based on feedback from Conjecture, our understanding is that Conjecture is now much more excited about their new research direction in cognitive emulation. We discuss this new direction in the following section. However, as an organization’s past track record is one of the best predictors of their future impact, we believe it is important to understand Conjecture’s previous approach.

To Conjecture’s credit, they acknowledged a number of mistakes in their retrospective. For example, they note that their simulators post was overinvested in, and “more experienced alignment researchers who have already developed their own deep intuitions about GPT-like models didn’t find the framing helpful.” However, there are several issues we identify (such as lack of rigor) that are not discussed in the retrospective. There are also issues discussed in the retrospective where Conjecture leadership comes to the opposite conclusion to us: for example, Conjecture writes that they “overinvested in legibility and polish” whereas we found many of their posts to be difficult to understand and evaluate.

We believe three representative posts, which Conjecture leadership were excited by as of 2022 Q3, were: janus’s post on simulators, Sid and Lee’s post on polytopes, and their infohazard policy. These accomplishments were also highlighted in their retrospective. Although we find these posts to have some merit, we would overall assess them as having limited impact. Concretely, we would evaluate Redwood’s Indirect Object Identification or Causal Scrubbing papers as both more novel and scientifically rigorous. We discuss their infohazard policy, simulators and polytopes post in turn below.

Their infohazard policy is a fairly standard approach to siloing research, and is analogous to structures common in hedge funds or classified research projects. It may be positive for Conjecture to have adopted such a policy (although it introduces risks of concentrating power in the CEO, discussed in the next section), but it does not provide any particular demonstration of research capability.

The simulators and polytopes posts are both at an exploratory stage, with limited empirical evidence and unclear hypotheses. Compared to similar exploratory work (e.g. the Alignment Research Center), we think Conjecture doesn’t make their assumptions clear enough and have too low a bar for sharing, reducing the signal-to-noise ratio and diluting standards in the field. When they do provide evidence, it appears to be cherry picked.

Their posts also do not clearly state the degree of belief they have in different hypotheses. Based on private conversations with Conjecture staff, they often appear very confident in their views and results of their research despite relatively weak evidence for them. In the simulators post, for example, they describe sufficiently large LLMs as converging to simulators capable of simulating “simulacra”: different generative processes that are consistent with the prompt. The post ends with speculative beliefs that they stated fairly confidently that took the framing to an extreme (e.g if the AI system adopts the “superintelligent AI persona” it’ll just be superintelligent).

We think the framing was overall helpful, especially to those newer to the field, although it can also sometimes be confusing: see e.g. these critiques. The framing had limited novelty: our anecdotal impression is that most researchers working on language model alignment were already thinking along similar lines. The more speculative beliefs stated in the post are novel and significant if true, but the post does not present any rigorous argument or empirical evidence to support them. We believe it’s fine to start out with exploratory work that looks more like an op-ed, but at some point you need to submit your conjectures to theoretical or empirical tests. We would encourage Conjecture to explicitly state their confidence levels in written output and make clear what evidence base they do or do not have for a given hypothesis (e.g. conceptual argument, theoretical result, empirical evidence).

New research agenda (Nov 22 - Present)

Conjecture now has a new research direction exploring cognitive emulation. The goal is to produce bounded agents that emulate human-like thought processes, rather than agents that produce good output but for alien reasons. However, it’s hard to evaluate this research direction as they are withholding details of their plan due to their infohazard policies. On the face of it, this project is incredibly ambitious, and will require huge amounts of effort and talent. Because of this, details on how they will execute the project are important to understanding how promising this project may be. We would encourage Conjecture to share some more technical detail unless there are concrete info-hazards they are concerned about. In the latter case we would suggest sharing details with a small pool of trusted TAIS researchers for external evaluation.

CEO’s character and trustworthiness

We are concerned by the character and trustworthiness of Conjecture’s CEO, Connor Leahy. We are also concerned that Connor has demonstrated a lack of attention to rigor and engagement with risky behavior, and that he, along with other staff, have demonstrated an unwillingness to take external feedback (see below).

Although this section focuses on the negatives, there are of course positive aspects to Connor’s character. He is clearly a highly driven individual, who has built a medium-sized organization in his early twenties. He has shown a willingness to engage with arguments and change his mind on safety concerns, for example delaying the release of his GPT-2 replication. Moreover, in recent years Connor has been a vocal public advocate for safety: although we disagree in some cases with the framing of the resulting media articles, in general we are excited to see greater public awareness of safety risks.^[2]

The character of an organization’s founder and CEO is always an important consideration, especially for early-stage companies. Moreover, we believe this consideration is particularly strong in the case of Conjecture:

Conjecture engages in governance outreach that involves building relationships between government actors and the TAIS community, and there are multiple accounts of Conjecture misrepresenting themselves.
As the primary stakeholder & CEO, Connor will be responsible for balancing incentives to develop capabilities from stakeholders (see below).
Conjecture’s infohazard policy has the consequence of heavily centralizing power to the CEO (even more so than a typical tech company). The policy mandates projects are siloed, and staff may be unaware of the details (or even the existence) of significant fractions of Conjecture’s work. The CEO is Conjecture’s “appointed infohazard coordinator” with “access to all secrets and private projects” – and thus is the only person with full visibility. This could substantially reduce staff’s ability to evaluate Conjecture’s strategy and provide feedback internally. Additionally, if they don’t have the full information, they may not know if Conjecture is contributing to AI risk.^[3] We are uncertain the degree to which this is a problem given Conjecture’s current level of internal secrecy.

Conjecture and their CEO misrepresent themselves to various parties

We are generally worried that Connor will tell the story that he expects the recipient to find most compelling, making it challenging to confidently predict his and Conjecture’s behavior. We have heard credible complaints of this in their interactions with funders. One experienced technical AI safety researcher recalled Connor saying that he will tell investors that they are very interested in making products, whereas the predominant focus of the company is on AI safety.

We have heard that Conjecture misrepresent themselves in engagement with the government, presenting themselves as experts with stature in the AIS community, when in reality they are not. We have heard reports that Conjecture’s policy outreach is decreasing goodwill with policymakers. We think there is a reasonable risk that Connor and Conjecture’s actions may be unilateralist and prevent important relationships from being formed by other actors in the future.

Unfortunately we are unable to give further details about these incidents as our sources have requested confidentiality; we understand this may be frustrating and acknowledge it is difficult for Conjecture to substantively address these concerns.

We would recommend Connor be more honest and transparent about his beliefs, plans and Conjecture’s role in the TAIS ecosystem. We also recommend the Conjecture introduce a strong, robust governance structure (see below).

Contributions to race dynamics

We believe that Connor Leahy has contributed to increasing race dynamics and accelerating capabilities research, through supporting the creation of Stability AI through founding EleutherAI. EleutherAI is a community research group focused on open-source AI research founded in 2020. Under Connor’s leadership, their plan was to build and release large open-source models to allow more people to work on important TAIS research that is only possible on pretrained LLMs. At the time, several members of the TAIS community, including Dan Hendrycks (founder of CAIS), privately warned Connor and EleutherAI that it would be hard to control an open source collective.

Stability AI

Stability AI brands themselves as an AGI lab and has raised $100M to fund research into and training of large, state-of-the-art models including Stable Diffusion.^[4] The addition of another AGI focused lab is likely to further exacerbate race dynamics. Stability is currently releasing the majority of the work they create as open-source: this has some benefits, enabling a broader range of researchers (including alignment researchers) to study these models. However, it also has significant drawbacks, such as making potential moratoriums on capabilities research much harder (if not impossible) to enforce. To our knowledge, Stability AI has not done much algorithmic advancement yet.

EleutherAI was pivotal in the creation of Stability AI. Our understanding is that the founder of Stability AI, Emad Mostaque, was active on the EleutherAI Discord and recruited much of his initial team from there. On the research side, Stability AI credited EleutherAI as supporting the initial version of Stable Diffusion in August 2022, as well as their most recent open-source language model release StableLM in April 2023. Emad (in Feb 2023) described the situation as: “Eleuther basically split into two. Part of it is Stability and the people who work here on capabilities. The other part is Conjecture that does specific work on alignment, and they’re also based here in London.”

Stability AI continues to provide much of EleutherAI’s compute and is a sponsor of EluetherAI, alongside Nat Friedman (who also invested in Conjecture). Legally, Stability AI directly employed key staff of EleutherAI in a relationship we believe was similar to fiscal sponsorship. We understand that EleutherAI have recently transitioned to employing staff directly via their own non-profit entity (Connor and Emad sit on the board).

EleutherAI

EleutherAI is notable for having developed open-source LLMs such as GPT-NeoX. In the announcement post in February 2022, they claimed that “GPT-NeoX-20B is, to our knowledge, the largest publicly accessible pretrained general-purpose autoregressive language model, and we expect it to perform well on many tasks.”

We do not think that there was much meaningful alignment output from EleutherAI itself during Connor’s tenure – most of the research published is capabilities research, and the published alignment research is of mixed quality. On the positive side, EleutherAI’s open-source models have enabled some valuable safety research. For example, GPT-J was used in the ROME paper and is widely used in Jacob Steinhardt’s lab. EleutherAI is also developing a team focusing on interpretability, with their initial work includes developing the tuned lens in a collaboration with FAR AI and academics from Boston and Toronto.

Connor’s founding and management of EleutherAI indicates to us that he was overly optimistic about rapidly growing a community of people interested in language models and attracting industry sponsorship translating into meaningful alignment research. We see EleutherAI as having mostly failed at its goals of AI safety, and instead accelerated capabilities via their role in creating Stability.ai and Stable Diffusion.

In particular, EleutherAI’s supporters were primarily interested in gaining access to state-of-the-art LLM capabilities with limited interest in safety. For example, the company Coreweave provided EleutherAI with compute and then used their models to sell a LM inference API called GooseAI. We conjecture that the incentive to please their sponsors, enabling further scale-up, may have contributed to EleutherAI’s limited safety output.

We feel more positively about Conjecture than early-stage EleutherAI given Conjecture’s explicit alignment research focus, but are concerned that Connor appears to be bringing a very similar strategy to Conjecture as to EleutherAI: scaling before producing tangible alignment research progress and attracting investment from external actors (primarily investors) with opposing incentives that they may not be able to withstand. We would encourage Conjecture to share a clear theory of change which includes safeguards against these risks.

To be clear, we think Conjecture’s contribution to race dynamics is far less than that of OpenAI or Anthropic, both of which have received funding and attracted talent from the EA ecosystem. We would assess OpenAI as being extremely harmful for the world. We are uncertain on Anthropic: they have undoubtedly contributed to race dynamics (albeit less so than OpenAI), but have also produced substantial safety research. We will discuss Anthropic further in an upcoming post, but in either case we do not think that AGI companies pushing forward capabilities should exempt Conjecture or other organizations from criticisms.

Overstatement of accomplishments and lack of attention to precision

In June 2019, Connor claimed to have replicated GPT-2 while he was an undergraduate. However, his results were inaccurate and his 1.5B parameter model was weaker than even the smallest GPT-2 series model.^[5] He later admitted to these mistakes, explaining that his metric code was flawed and that he commingled training and evaluation datasets. Additionally, he said that he didn’t evaluate the strength of his final model, only one halfway through training. He said the reason he did this was because “I got cold feet once I realized what I was sitting on [something potentially impressive] and acted rashly.”^[6] We think this points to a general lack of thoughtfulness for making true and accurate claims.

We don’t want to unfairly hold people’s mistakes from their college days against them – many people exaggerate or overestimate (intentionally or not) their own accomplishments. Even a partial replica of GPT-2 is an impressive technical accomplishment for an undergraduate, so this project does attest to Connor’s technical abilities. It is also positive that he admitted his mistake publicly. However, overall we do believe the project demonstrates a lack of attention to detail and rigor. Moreover, we haven’t seen signs that his behavior has dramatically changed.

Inconsistency over time regarding releasing LLMs

Connor has changed his stance more than once regarding whether to publicly release LLMs. Given this, it is difficult to be confident that Conjecture’s current approach of defaulting to secrecy will persist over time.

In July 2019, Connor released the source code used to train his replica along with pretrained models comparable in size to the already released GPT-2 117M and and 345M models. The release of the training code seems hasty, enabling actors with sufficient compute but limited engineering skills to train their own, potentially superior, models. At this point, Connor was planning to release the full 1.5B parameter model to the public, but was persuaded not to.^[7] In the end, he delayed releasing the model to Nov 13 2019, a week after OpenAI released their 1.5B parameter version, on his personal GitHub.

In June 2021 Connor changed his mind and argued that releasing large language models would be beneficial to alignment as part of the team at EleutherAI (see discussion above). In Feb 2022, EleutherAI released an open-source 20B parameter model, GPT-NeoX. Their stated goal, endorsed by Connor in several places, was to “train a model comparable to the biggest GPT-3 (175 billion parameters)” and release it publicly. Regarding the potential harm of releasing models, we find Connor’s arguments plausible – whether releasing open-source models closer to the state-of-the-art is beneficial or not remains a contested point. However, we are confident that sufficiently capable models should not be open-sourced, and expect strong open-source positive messaging to be counterproductive. We think EleutherAI made an unforced error by not at least making some gesture towards publication norms (e.g. they could have pursued a staggered release giving early access to vetted researchers).

In July 2022, Connor shared Conjecture’s Infohazard Policy. This policy is amongst the most restrictive at any AI company – even more restrictive than what we would advocate for. To the best of our knowledge, Conjecture’s Infohazard Policy is an internal policy that can be overturned by Connor (acting as chief executive), or by a majority of their owners (of whom Connor as a founder will have a significant stake). Thus we are hesitant to rely on Conjecture’s Infohazard Policy remaining strictly enforced, especially if subject to commercial pressures.

Scaling too quickly

We think Conjecture has grown too quickly, from 0 to at least 22 staff from 2021 to 2022. During this time, they have not had what we would consider to be insightful or promising outputs, making them analogous to a very early stage start-up. This is a missed opportunity: their founding team and early employees include some talented individuals who, given time and the right feedback, might well have been able to identify a promising approach.

We believe that Conjecture’s basic theory of change for scaling is:

1) they’ve gotten good results relative to how young they are, even though the results themselves are not that insightful or promising in absolute terms, and

2) the way to improve these results is to scale the team so that they can test out more ideas and get more feedback on what does and doesn’t work.

Regarding 1) we think that others of similar experience level – and substantially less funding – have produced higher-quality output. Concretely, we are more excited about Redwood’s research than Conjecture (see our criticisms of Conjecture’s research), despite being critical of Redwood’s cost-effectiveness to date.^[8] Notably, Redwood drew on a similar talent pool to Conjecture, largely hiring people without prior ML research experience.

Regarding 2), we disagree that scaling up will improve their research quality. In general, the standard lean startup team advice is that it’s important to keep your team small while you are finding product-market fit or, in Conjecture’s case, developing an exciting research agenda. We think it’s very likely Conjecture will want to make major pivots in the next few years. Rapid growth will make it harder for them to pivot. With growing scale, more time will be spent on management, and it will be easier to get people locked into the wrong project or create dynamics where people are more likely to defend their pet projects. We can’t think of examples where scale up has taken place successfully before finding product-market fit.

This growth would be challenging to manage in any organization. However, in our opinion alignment research is more challenging to scale than a traditional tech start-up due to the weaker feedback loops: it’s much harder to tell if your alignment research direction is promising than whether you’ve found product-market fit.

Compounding this problem, their founding team Connor, Sid and Gabriel have limited experience in scaling research organizations. Connor and Sid’s experience primarily comes from co-founding EleutherAI, a decentralized research collective: their frustrations with that lack of organization are part of what drove them to found Conjecture. Gabriel has the most relevant experience.

Conjecture appeared to have rapid scaling plans, but their growth has slowed in 2023. Our understanding is that this slow-down is primarily due to them being unable to raise adequate funding for their expansion plans.

To address this problem, we would recommend that Conjecture:

Freeze hiring of junior staff until they identify scalable research directions that they and others in the alignment community are excited by. Conjecture may still benefit from making a small number of strategic hires that can help them manage their current scale and continue to grow, such as senior research engineers and people who have experience managing large teams.
Consider focusing on one area (e.g. technical research) and keeping other teams (e.g. product and governance) lean, or even consider whether they need them.
While we don’t think it’s ideal to let go of staff, we tentatively suggest Conjecture consider whether it might be worth making the team smaller to focus on improving their research quality, before growing again.

Unclear plan for balancing profit and safety motives

According to their introduction post, they think being a for-profit company is the best way to reach their goal because it lets them “scale investment quickly while maintaining as much freedom as possible to expand alignment research.” We think this could be challenging in practice: scaling investment requires delivering results that investors find impressive, as well as giving investors some control over the firm in the form of voting shares and, frequently, board seats.

Conjecture has received substantial backing from several prominent VCs. This is impressive, but since many of their backers (to our knowledge) have little interest in alignment, Conjecture will be under pressure to develop a pathway to profitability in order to raise further funds.

Many routes to developing a profitable AI company have significant capabilities externalities. Conjecture’s CEO has indicated they plan to build “a reliable pipeline to build and test new product ideas” on top of internal language models. Although this seems less bad than the OpenAI model of directly advancing the state-of-the-art in language models, we expect demonstrations of commercially viable products using language models to lead to increased investment in the entire ecosystem – not just Conjecture.

For example, if Conjecture does hit upon a promising product, it would likely be easy for a competitor to copy them. Worse, the competitor might be able to build a better product by using state-of-the-art models (e.g. those available via the OpenAI API). To keep up, Conjecture would then have to either start training state-of-the-art models themselves (introducing race dynamics), or use state-of-the-art models from competitors (and ultimately provide revenue to them).

Conjecture may have good responses to this. Perhaps there are products which are technically intricate to develop or have other barriers to entry making competition unlikely, and/or where Conjecture’s internal models are sufficient. We don’t have reason to believe Verbalize falls into this category as there are several other competitors already (e.g. fireflies.ai, otter.ai, gong.io). We would encourage Conjecture to share any such plans they have to simultaneously serve two communities (for-profit VCs and TAIS), with sometimes conflicting priorities, for review with both sets of stakeholders.

Our impression is that they may not have a solid plan here (but we would invite them to share their plans if they do). Conjecture was trying to raise a series B from EA-aligned investors to become an alignment research organization. This funding round largely failed, causing them to pivot to focus more on VC funding. Based on their past actions we think it’s likely that they may eventually hit a wall with regards to product development, and decide to focus on scaling language models to get better results, contributing to race dynamics. In fairness to Conjecture, we would consider the race risk of Conjecture to be much smaller than that of Anthropic, which operates at a much bigger scale, is scaling much more rapidly, and has had more commercial success with its products.

It’s not uncommon that people and orgs who conceive of or present themselves as AIS focused end up advancing capabilities much more than safety. OpenAI is perhaps the most egregious case of this, but we are also concerned about Anthropic (and will write about this in a future post). These examples should make us suspect that by default Conjecture’s for-profit nature will end up causing it to advance capabilities, and demand a clear and detailed plan to avoid this to be convinced otherwise.

In addition to sharing their plans for review, we would recommend that Conjecture introduce robust corporate governance structures. Our understanding is that Conjecture is currently structured as a standard for-profit start-up with the founders controlling the majority of voting shares and around a third of the company owned by VCs. This is notably worse than OpenAI LP, structured as a “capped-profit” corporation with non-profit OpenAI, Inc. the sole controlling shareholder.^[9] One option would be for Conjecture to implement a “springing governance” structure in which given some trigger (such as signs that AGI is imminent, or that their total investment exceeds some threshold) its voting shares become controlled by a board of external advisors. This would pass governance power, but not financial equity, to people who Conjecture considers to be a good board – rather than being controlled wholly by their founding team.

Limited meaningful engagement with external actors

Lack of productive communication between TAIS researchers and Conjecture staff

We know several members of the EA and TAIS community who have tried to share feedback privately with Conjecture but found it very challenging. When negative feedback is shared, members of the Conjecture team sometimes do not engage meaningfully with it, missing the key point or reacting defensively. Conjecture leadership will provide many counter-arguments, none of which address the core point, or are particularly strong. This is reminiscent of the Gish gallop rhetorical technique, which can overwhelm interlocutors as it’s very difficult to rebut each counter-argument. Some Conjecture staff members also frequently imply that the person giving the criticism has ulterior motives or motivated reasoning.

It can be hard to hear criticism of a project you are passionate about and have invested considerable time in, so it’s natural that Conjecture staff are defensive over their work. However, we would recommend that Conjecture staff and especially leadership make an effort to constructively engage in criticism, seeking to understand where the critique is coming from, and take appropriate steps to correct misunderstandings and/or resolve the substance of the critique.

Lack of engagement with the broader ML community

Conjecture primarily disseminates their findings on the Alignment Forum. However, many of their topics (particularly interpretability) are at least adjacent to active research fields, such that a range of academic and industry researchers could both provide valuable feedback on Conjecture’s research and gain insights from their findings.

Conjecture is not alone in this: as we wrote previously, we also think that Redwood could engage further with the ML community. Conjecture has not published any peer-reviewed articles, so we think they would benefit even more than Redwood from publishing their work and receiving external feedback. We would recommend Conjecture focus on developing what they consider to be their most insightful research projects into a conference-level paper, and hiring more experienced ML research scientists or advisors to help them both effectively communicate their research and improve rigor.

Our views on Conjecture

We are genuinely concerned about Conjecture’s trustworthiness and how they might negatively affect the TAIS community and the TAIS community’s efforts to reduce risk from AGI. These are the main changes we call for, in rough order of importance.

We would advise against working at Conjecture

Given Conjecture’s weak research track record, we expect the direct impact of working at Conjecture to be low. We think there are many more impactful places to work, including non-profits such as Redwood, CAIS and FAR; alignment teams at Anthropic, OpenAI and DeepMind; or working with academics such as Stuart Russell, Sam Bowman, Jacob Steinhardt or David Krueger. Note we would not in general recommend working at capabilities-oriented teams at Anthropic, OpenAI, DeepMind or other AGI-focused companies.

Additionally, Conjecture seems relatively weak for skill building, since their leadership team is relatively inexperienced and also stretched thin due to Conjecture’s rapid scaling. We expect most ML engineering or research roles at prominent AI labs to offer better mentorship than Conjecture. Although we would hesitate to recommend taking a position at a capabilities-focused lab purely for skill building, we find it plausible that Conjecture could end up being net-negative, and so do not view Conjecture as a safer option in this regard than most competing firms.

In general, we think that the attractiveness of working at an organization that is connected to the EA or TAIS communities makes it more likely for community members to take jobs at such organizations even if this will result in a lower lifetime impact than alternatives. Conjecture’s sponsorship of TAIS field building efforts may also lead new talent, who are unfamiliar with Conjecture’s history, to have an overly rosy impression of them.

We would advise Conjecture to take care when engaging with important stakeholders and represent their place in the TAIS ecosystem accurately

We are concerned that Conjecture has misrepresented themselves to various important stakeholders, including funders and policymakers. We think there is a reasonable risk that Connor and Conjecture’s outreach to policymakers and media is alarmist and may decrease the credibility of x-risk. These unilateral actions may therefore prevent important relationships from being formed by other actors in the future. This risk is further exacerbated by Connor’s unilateralist actions in the past, Conjecture’s overall reluctance to take feedback from external actors, and their premature and rapid scaling.

We do not think that Conjecture should receive additional funding before addressing key concerns

We have substantial concerns with the organization’s trustworthiness and the CEO’s character. We would strongly recommend that any future funding from EA sources be conditional on Conjecture putting in place a robust corporate governance structure to bring them at least on par with other for-profit and alignment-sympathetic firms such as OpenAI and Anthropic.

Even absent these concerns, we would not currently recommend Conjecture for funding due to the lack of a clear impact track record despite a considerable initial investment of $10mn. To recommend funding, we would want to see both improvements in corporate governance and some signs of high-quality work that the TAIS community are excited by.

Largely we are in agreement with the status quo here: so far Conjecture has largely been unsuccessful fundraising from prominent EA funders, and where they have received funding it was for significantly less than their initial asks.

We encourage TAIS and EA community members to consider to what extent they want to legitimize Conjecture until Conjecture addresses these concerns

Conjecture has several red flags and a weak track record for impact. Although the TAIS and EA community have largely refrained from explicit endorsements of Conjecture (such as funding them), there are a variety of implicit endorsements. These include tabling at EA Global career fairs, Lightcone hosting Conjecture events and inviting Conjecture staff, field-building organizations such as MATS and ARENA working with Conjecture as a fiscal sponsor,^[10] as well as a variety of individuals in the community (mostly unaware of these issues) recommending Conjecture as a place to work.

To clarify, we think individuals should still read and engage with Conjecture’s research where they judge it to be individually worth their time. We also welcome public debates involving Conjecture staff, such as the one between Paul Christiano and Gabriel Alfour. Our goal is not to shun Conjecture, but to avoid giving them undue influence until their research track record and governance structure improves.

We recognize that balancing these considerations can be tricky, which is why our main recommendation is to encourage people to spend time actively reflecting on how they want to engage with Conjecture in light of the information we present in this post (alongside other independent sources).

Appendix

Communication with Conjecture

We shared a draft of this post with Conjecture to review, and have included their full response (as they indicated they would post it publicly) below. We thank them for their engagement and made several minor updates to the post in response, however we disagree with several key claims made by Conjecture in their response. We describe the changes we made, and where we disagree, in the subsequent section.

Conjecture’s Reply

Hi,

Thank you for your engagement with Conjecture’s work and for providing us an opportunity to share our feedback.

As it stands, the document is a hit piece, whether intentional or not. It is written in a way such that it would not make sense for us to respond to points line-by-line. There are inaccuracies, critiques of outdated strategies, and references to private conversations where the details are obscured in ways that prevent us from responding substantively. The piece relies heavily on criticism of Connor, Conjecture CEO, but does not attempt to provide a balanced assessment: there are no positive comments written about Connor along with the critiques, and past mistakes he admitted to publicly are spun as examples of “low-integrity” behavior. Nuanced points such as the cost/benefit of releasing small open source models (pre-Chinchilla) are framed as “rash behavior,” even when you later write that you find Connor’s arguments “plausible.” Starting from this negative frame does not leave room for us to reply and trust that an object-level discussion will proceed.

We also find it surprising to see that most of the content of the piece is based on private discussions and documents shared between Conjecture, ~15 regrantors, and the FTX Future Fund team in August 2022. The piece does not disclose this context. Besides the fact that much of that information is outdated and used selectively, the information has either been leaked to the two anonymous authors, or one of the authors was directly involved in the regranting process. In either case, this is a violation of mutual confidentiality between Conjecture and regrantors/EA leadership involved in that channel.

We don’t mind sharing our past plans and discussions now and would be happy to publish the entire discussions from the Slack channel where those conversations took place (with consent of the other participants). However, it is a sad conclusion of that process that our openness to discussing strategy in front of regrantors formed the majority set of Bay Area TAIS leadership opinions about Conjecture that frame us as not open, despite these conversations being a deeper audit than pretty much any other TAIS organization.

We’d love to have a productive conversation here, but will only respond in detail if you reframe this post from a hit piece to something better informed. If your aim is to promote coordination, we would recommend asking questions about our plans and beliefs, focusing on the parts that do not make sense to you, and then writing your summary. Conjecture’s strategy is debatable, and we are open to changing it—and have done so in the past. Our research is also critiqueable: we agree that our research output has been weak and have already written about this publicly here. But as described above, this post doesn’t attempt to engage with Conjecture’s current direction.

Going further, if the aim of your critique is to promote truth-seeking and transparency, we would gladly participate in a project about creating and maintaining a questionnaire that all AI orgs should respond to, so that there is as little ambiguity in their plans as possible. In our posts we have argued for making AI lab’s safety plans more visible, and previously ran a project of public debates aimed at highlighting cruxes in research disagreements. Conjecture is open to our opinion being on the record, so much so that we have occasionally declined private debates with individuals who don’t want to be on record. This decision may contribute to some notion of our “lack of engagement with criticism.”

—

As a meta-point, we think that certain strategic disagreements between Conjecture and the Bay Area TAIS circles are bleeding into reputational accusations here. Conjecture has been critical of the role that EA actors have played in funding and supporting major AGI labs historically (OAI, Anthropic), and critical of current parts of the EA TAIS leadership and infrastructure that continue to support the development of superintelligence. For example, we do not think that GPT-4 should have been released and are concerned at the role that ARC’s benchmarking efforts played in safety-washing the model. These disagreements in the past have created friction, and we’d hazard that concerns about Conjecture taking “unilateral action” are predicted on this.

Instead of a more abstract notion of “race dynamics,” Conjecture’s main concern is that a couple of AI actors are unabashedly building superintelligence. We believe OpenAI, Deepmind, and Anthropic are not building superintelligence because the market and investors are demanding it. We believe they are building superintelligence because they want to, and because AGI has always been their aim. As such, we think you’re pointing the finger in the wrong direction here about acceleration risks.

If someone actually cares about curtailing “the race”, their best move would be to push for a ban on developing superintelligence and strongly oppose the organizations trying to build it. Deepmind, OpenAI, and Anthropic have each publicly pushed the AI state of the art. Deepmind and OpenAI have in their charters that they want to build AGI. Anthropic’s most recent pitch deck states that they are planning to train an LLM orders of magnitude larger than competitors, and that “companies that train the best ²⁰²⁵⁄₂₆ models will be too far ahead for anyone to catch up in subsequent cycles,” which is awfully close to talking about DSA. No one at the leadership of these organizations (which you recommend people work at rather than Conjecture) have signed FLI’s open letter calling for a pause in AI development. Without an alignment solution, the reasonable thing for any organization to do is stop development, not carve out space to continue building superintelligence unimpeded.

While Conjecture strongly disagrees with the strategies preferred by many in the Bay Area TAIS circles, we’d hope that healthy conversations would reveal some of these cruxes and make it easier to coordinate. As written, your document assumes the Bay Area TAIS consensus is superior (despite being what contributed largely to the push for ASI), casts our alternative as “risking unilateral action,” and deepens the rift.

—

We have a standing offer to anyone to debate with us, and we’d be very happy to discuss with you any part of our strategy, beliefs about AI risks, and research agenda.

More immediately, we encourage you to rewrite your post as a Q&A aimed at asking for our actual views before forming an opinion, or at a minimum, rewrite your post with more balance and breathing room to hear our view. As it stands, this post cleaves the relationship between part of the TAIS ecosystem and Conjecture further and is unproductive for both sides.

Given the importance of having these conversations in the open, we plan to make this reply public.

Thanks for your time and look forward to your response,

Conjecture C-Suite

Brief response and changes we made

Conjecture opted not to respond to our points line-by-line and instead asked us to rewrite the post as a Q&A or “with more balance and breathing room to hear our view.” While we won’t be rewriting the post, we have made changes to the post in response to their feedback, some of which are outlined below.

Conjecture commented that the tone of the post was very negative, and in particular there was a lack of positive comments written about Connor. We have taken that feedback into consideration and have edited the tone to be more neutral & descriptive (with particular attention to the section on Connor). Conjecture also noted that Connor admitted to some of his mistakes publicly. We had previously linked to Connor’s update post on the partial GPT-2 replication, but we edited the section to make it more clear that he did acknowledge his mistake. They also pointed out that we framed the point on releasing models “as “rash behavior,” even when you later write that you find Connor’s arguments “plausible.” We’ve changed this section to be more clear.

They say “this post doesn’t attempt to engage with Conjecture’s current direction.” As we write in our section on their cognitive emulation research, there is limited public information on their current research direction for us to comment on.

They believe that “most of the content of the piece is based on private discussions and documents shared between Conjecture, ~15 regrantors, and the FTX Future Fund team in August 2022.” This is not the case: the vast majority (90+%) of this post is based on publicly available information and our own views which were formed from our independent impression of Conjecture via conversations with them and other TAIS community members. We think the content they may be referring to is:

One conversation that we previously described in the research section regarding Conjecture’s original research priorities. We have removed this reference.
One point providing quantitative details of Conjecture’s growth plans in the scaling section, which we have removed the details of.
The section on how Conjecture and their CEO represent themselves to other parties. This information was not received from those private discussions and documents.

They say they wouldn’t mind “sharing our past plans and discussions now and would be happy to publish the entire discussions from the Slack channel where those conversations took place (with consent of the other participants).” We welcome and encourage the Conjecture team to share their past plans publicly.

They note that “Conjecture is open to our opinion being on the record, so much so that we have occasionally declined private debates with individuals who don’t want to be on record. This decision may contribute to some notion of our ‘lack of engagement with criticism.’” This is not a reason for our comment on their lack of engagement. They mentioned they have “a standing offer to anyone to debate with us”. We appreciate the gesture, but do not have capacity to engage in something as in-depth as a public debate at this time (and many others who have given feedback don’t either).

Conjecture points out the role “EA actors have played in funding and supporting major AGI labs historically (OAI, Anthropic)”, that our “document assumes the Bay Area TAIS consensus is superior … casts our alternative as “risking unilateral action”, and that “these disagreements in the past have created friction, and we’d hazard that concerns about Conjecture taking “unilateral action” are predicted on this.” We outline our specific concerns on unilateralist action, which don’t have to do with Conjecture’s critiques of EA TAIS actors, here. Examples of disagreements with TAIS actors that they cite include:

Conjecture being critical of the role EA actors have played in funding/supporting major AGI labs.
EA TAIS leadership that continue to support development of AGI.
They don’t think GPT-4 should have been released.
They are concerned that ARC’s benchmarking efforts might have safety-washed GPT-4.

We are also concerned about the role that EA actors have and potentially continue to play in supporting AGI labs (we will cover some of these concerns in our upcoming post on Anthropic). We think that Conjecture’s views on ARC are reasonable (although we may not agree with their view). Further, many other EAs and TAIS community members have expressed concerns on this topic, and about OpenAI in particular. We do not think holding this view is particularly controversial or something that people would be critical of. Views like this did not factor into our critique.

Finally, they propose that (rather than critiquing them), we should push for a ban on AGI and oppose organizations trying to build it (OpenAI, DM & Anthropic). While we agree that other labs are concerning, that doesn’t mean that our concerns about Conjecture are erased.

Notes

Gabriel Alfour is still listed as the CEO on Marigold’s website: we are unsure if this information is out of date, or if Gabriel still holds this position. We also lack a clear understanding of what Marigold’s output is, but spent limited time evaluating this. ↩︎

In particular, Connor has referred to AGI as god-like multiple times in interviews (CNN, Sifted). We are skeptical if this framing is helpful. ↩︎

Employee retention is a key mechanism by which tech companies have been held accountable: for example, Google employees’ protest over Project Maven led to Google withdrawing from the project. Similarly, the exodus of AIS researchers from OpenAI to found Anthropic was partly fueled by concerns that OpenAI was contributing to AI risk. ↩︎

Stable Diffusion is a state-of-the-art generative model with similar performance to OpenAI’s DALL-E. They are open-source and open-access—there are no restrictions or filters, so you’re not limited by what restrictions a company like OpenAI might apply. This means that people can use the model for abusive behavior (such as deepfakes) ↩︎

Connor reports a WikiText2 perplexity of 43.79 for his replica. This is considerably worse than the 18.34 perplexity achieved by GPT-2 1.5B on this dataset (reported in Table 3 of Radfort et al), and substantially worse than the perplexity achieved by even the smallest GPT-2 117M of 29.41. It is slightly worse than the previously reported state-of-the-art prior to the GPT-2 paper, of 39.14 (reported in Table 2 of Gong et al). Overall, it’s a substantial accomplishment, especially for an undergraduate who built the entire training pipeline (including data scraping) from scratch, but is far from a replication. ↩︎

Here is the full text from the relevant section of the article: “model is not identical to OpenAI’s because I simply didn’t have all the details of what they did … [and] the samples and metrics I have shown aren’t 100% accurate. For one, my metric code is flawed, I made several rookie mistakes in setting up accurate evaluation (let train and eval data mix, used metrics whose math I didn’t understand etc), and the model I used to generate the samples is in fact not the final trained model, but one about halfway through the training. I didn’t take my time to evaluate the strength of my model, I simply saw I had the same amount of hardware as OpenAI and code as close to the paper as possible and went with it. The reason for this is a simple human flaw: I got cold feet once I realized what I was sitting on and acted rashly.” ↩︎

This was in part due to conversations with OpenAI and Buck Shlegeris (then at MIRI) ↩︎

Redwood and Conjecture have received similar levels of funding ↩︎

Anthropic has a public benefit corporation structure, with reports that it includes a long-term benefit committee of people unaffiliated with the company who can override the composition of its board. Overall we have too little information to judge whether this structure is better or worse than OpenAI’s, but both seem better than being a standard C-corporation. ↩︎

Conjecture has been active in running or supporting programs aimed at AI safety field-building. Most notably, they ran the Refine incubator, and are currently fiscally sponsoring ARENA and MATS for their London based cohort. We expect overall these programs are net-positive, and are grateful that Conjecture is contributing to them. However, it may have a chilling effect: individuals may be reluctant to criticize Conjecture if they want to be part of these sponsored programs. It may also cause attendees to be more likely than they otherwise would to work for Conjecture. We would encourage ARENA and MATS to find a more neutral fiscal sponsor in the UK to avoid potential conflicts of interest. For example, they could hire staff members using employer-of-record services such as Deel or Remote. If Conjecture does continue fiscally sponsoring organizations, we would encourage them to adopt a clear legal separation between Conjecture and fiscally sponsored entities along with a conflict-of-interest policy to safeguard the independence of the fiscally sponsored entities. ↩︎

Critiques of prominent AI safety labs: Conjecture

Key Takeaways

Criticisms and Suggestions

Our views on Conjecture

About Conjecture

Funding

Outputs

Products

Alignment Research

Infohazard policy

Governance outreach

Incubator Program

Team

Conjecture in the TAIS ecosystem

Criticisms and Suggestions

Low quality research

General thoughts on Conjecture’s research

Initial research agenda (March 2022 - Nov 2022)

New research agenda (Nov 22 - Present)

CEO’s character and trustworthiness

Conjecture and their CEO misrepresent themselves to various parties

Contributions to race dynamics

Overstatement of accomplishments and lack of attention to precision

Inconsistency over time regarding releasing LLMs

Scaling too quickly

Unclear plan for balancing profit and safety motives

Limited meaningful engagement with external actors

Lack of productive communication between TAIS researchers and Conjecture staff

Lack of engagement with the broader ML community

Our views on Conjecture

We would advise against working at Conjecture

We would advise Conjecture to take care when engaging with important stakeholders and represent their place in the TAIS ecosystem accurately

We do not think that Conjecture should receive additional funding before addressing key concerns

We encourage TAIS and EA community members to consider to what extent they want to legitimize Conjecture until Conjecture addresses these concerns

Appendix

Communication with Conjecture

Conjecture’s Reply

Brief response and changes we made

Notes