Retrospective: Lessons from the Failed Alignment Startup AISafety.com
TL;DR: Attempted to create a startup to contribute to solving the AI alignment problem. Ultimately failed due to rapid advancements in large language models and the inherent challenges of startups.
In early 2021, I began considering shorter AI development timelines and started preparing to leave my comfortable software development job to work on AI safety. Since I didn’t feel competent enough to directly work on technical alignment, my goal was capacity-building, personal upskilling, and finding a way to contribute.
During our reading group sessions, we studied Cotra’s “Case for Aligning Narrowly Superhuman Models” which made a compelling argument for working with genuinely useful models. This inspired us to structure our efforts as a startup. Our team comprised of Volkan Erdogan, Timothy Aris, Robert Miles, and myself, Søren Elverlin. We planned to offer companies automation of certain business processes using GPT-3 in exchange for alignment-relevant data for research purposes.
Given my strong deontological aversion to increasing AI capabilities, I aimed to keep the startup as stealthy as possible without triggering the Streisand effect. This decision significantly complicated fundraising and customer acquisition.
In November 2021, I estimated a 20% probability of success, a view shared by my colleagues. I was fully committed, investing DKK420,000 (USD55,000), drawing no salary for myself, and providing modest compensation to the others.
Startup literature generally advises against one-person startups. Despite our team of four, I was taking on a disproportionate amount of work and responsibility, which should have raised red flags.
My confidence in our success grew during the spring of 2022 when a personal contact helped me secure a preliminary project with a large company that wished to remain anonymous. For $1,300/month, I sold them a business automation solution that relied solely on a large language model for code-generation. However, it didn’t provide us with the data we sought. Both parties understood this was a preliminary project, and the company seemed eager about the full project.
Securing this project early on made Rob’s role redundant, and we amicably parted ways. Half a year later Tim was offered a PhD, leaving Volkan and I (with minimal help from Berk and Ali).
The preliminary project involved validating several not-quite-standardized Word documents, and I developed a VSTO plugin for Outlook to handle this task. It took longer than anticipated, mainly due to late-discovered requirements. Despite the iterative process, the client was ultimately very satisfied, and I focused on building trust with them during this phase.
The full project aimed to execute business processes in response to incoming emails using multiple fine-tuned GPT-3 models in stages and incorporating as much context as possible into the prompts. Our first practical target was sorting emails in a shared mailbox and delegating tasks to different department members. Initial experiments suggested this process was likely feasible to automate.
We were more intrigued by experiments demonstrating the opposite: certain business processes could not be replicated by GPT-3. This was particularly evident when deviations were necessary for common-sense reasons or when deviations would yield greater value. For example, a customer inquires if product delivery is possible before a specific date. The operator determines it’s barely unattainable but recognizes the potential for high profits, which would prompt a human operator to deviate from standard procedures. We could not persuade GPT-3 to do this, and exploring such discrepancies in strategic reasoning seemed worthwhile.
The technical aspects of the full project presented significant challenges. We discovered that we were less familiar with certain technologies than we initially thought, particularly OAuth and specific aspects of Azure. These issues caused us considerable difficulties. We tried hiring consultants through Upwork, but the results were mixed and the process was unexpectedly time-consuming. Both the VSTO-plugin and Azure solution are available on our GitHub.
Alarmingly, the business automation department attempted to position themselves between our software and the end-users. I objected to this, as I saw it as a considerable threat. However, it was a political battle I was destined to lose. Shortly after my loss, I discovered they had initiated a competing internal project for the full project.
When we presented our solution, the end-users were enthusiastic, but the procurement champion claimed not to “have time” to make a decision at that point. He continued to stall for a month, and eventually, I (correctly) deduced that they would choose their internal project. This setback was demoralizing, as customer acquisition is likely my weak point, and I somewhat lack the metis required for startups.
Simultaneously, it appears that Microsoft is planning to make a strong effort to integrate large language models into Microsoft Office. Although they are unlikely to offer the same level of customization as our solution, their presence as a competitor would likely diminish our ability to be profitable.
Furthermore, experiments with GPT-4 indicated that the misalignment effect we were interested in probably didn’t reflect any significant difference in strategic reasoning. Instead, it was likely a result of GPT-3′s relatively limited capabilities. This realization made our formal experiments unlikely to yield any interesting findings.
By mid-April, after 18 months, we faced three major obstacles: our primary customer was not going to commit, our business model appeared uncompetitive compared to Microsoft’s offerings, and the alignment data was unpromising. Consequently, I decided to discontinue business operations.
Although each individual external obstacle was difficult to predict, the existence of challenges was foreseeable. We received $50,000 in funding from LTFF with an application that estimated a 60% probability of failure. On the bright side, I believe I achieved significant personal upskilling during this process. While AISafety.com A/S is likely to undergo a substantial pivot, the organization may still contribute in a different way.
From the perspective of the business, it sounds like they just want their business problem solved. They don’t care how it’s solved, they just care that it’s solved. So to test the hypothesis that you can get clients, I wonder if it would have made sense to build a Wizard of Oz MVP and solve the business problem yourself instead of using GPT-3. I guess it depends on how much time that’d save you and how confident you are that you’d eventually be able to get the tech to work.
That makes sense. I’ve experienced similar things and I think it’s very common. Reading Authentication Still Sucks made me feel less imposter syndrom feelings.
I think a “Wizard of Oz”-style MVP may have been feasible, though a big part of our value proposition was speed. In retrospect, I could maybe have told the customer that the speed would be slower the first couple of months, and they likely would have accepted that. If I had done so, we plausibly could have failed faster, which is highly desirable.
I’m curious, why did you think that business data would help you with alignment? (fixed)
Did you mean help with alignment?
Yeah, I did.
Back 18 months ago, my (now falsified) theory was that some of the limitations we were seeing in GPT-3 were symptoms of a general inability of LLMs to reason strategically. This would have significant implications for alignment, in particular for our estimates of when they would become dangerous.
We noticed some business processes required a big-picture out-of-the-box kind of thinking that was kinda strategic if you squint, and observed that GPT-3 seemed to consistently fail to perform them in the same way as humans. Our hope was that by implementing these processes (as well as simpler and adjacent processes) we would be able to more precisely delineate what the strategic limitations of GPT-3 were.
So this project was something along the line of ARC Evals?
Yes, we were excited when we learned about ARC Evals. Some kind of evaluation was one of our possible paths to impact, though real-world data is much more messy than the carefully constructed evaluations I’ve seen ARC use. This has both advantages and disadvantages.
It strikes me that for a successful startup you ideally want to think big and raise a lot of money. Small efforts are inefficient and the VC community understand that there is a certain minimal scale to getting returns.