The CARLIN Method: Teaching AI How to Be Genuinely Funny

“There’s a humorous side to every situation. The challenge is to find it.”

George Carlin

Despite all of the recent advances in artificial intelligence, one challenge has consistently stumped even the most advanced language models: creating genuinely funny, original humor. While AI can easily regurgitate existing jokes or follow basic comedic formulas, it typically struggles to generate the kind of novel, contextually aware humor that humans appreciate. Enter CARLIN (Critically Analyze Reality, Link Insightful Notions), a new process designed to help large language models break free from canned responses and create more authentic comedy. Named in respectful homage to the Top 5 comedian George Carlin, this methodology isn’t just about making AI funnier – it’s about making it feel more human. By teaching AI to critically analyze reality and find unexpected connections, CARLIN addresses one of the most distinctive aspects of human intelligence: our ability to find and create humor in the world around us. This process has important implications that extend far beyond comedy—they touch on fundamental questions about AI’s capacity for creative thinking and genuine human-like interaction.

You don’t know me!

The Problem with AI Humor

Large Language Models (LLMs) are very impressive in many areas, from coding to creative writing, but when it comes to humor, they often fall noticeably flat. AI works by pattern matching, but humor works by breaking patterns. The main limitation isn’t that AI can’t tell jokes – it’s that it behaves more like a sophisticated joke database than a creative comedian. When prompted for humor, LLMs resort to recycling well-worn jokes (like “Why don’t scientists trust atoms? Because they make everything up!”) or following predictable comedic templates, resulting in groan-worthy punchlines you might find in a children’s joke book (like “Why don’t skeletons fight each other? They don’t have the guts”). This tendency toward canned jokes is even more obvious in instruction-tuned models (the models behind ChatGPT, Claude, Gemini, etc.), where the focus on helpful yet safe responses can limit the kind of creative risk-taking that makes humor work best. It’s a classic case of training working against creativity – the very mechanisms that make AI more reliable for serious tasks can make it less capable of genuine wit.

The problem lies in how LLMs typically process information: linearly, step by step, without the natural iteration and refinement like our humor creation. While human comedians might start with an idea, play with it, circle back, try different angles, and gradually refine their material through multiple iterations, traditional AI approaches follow a straightforward path from prompt to punchline. This linear processing is very different from the messy, iterative way humans naturally develop humor, where truly funny jokes often develop from a process of exploration, association, and refinement. The result is AI-generated humor that feels mechanical and predictable, lacking the surprising connections and clever insights that make human-created jokes genuinely funny. It’s like the difference between following a recipe and improvising in the kitchen – while the recipe might produce something edible, it’s the ability to experiment and iterate that leads to truly memorable meals.

Understanding Humor: The Theoretical Foundation

The key to understanding how humor works lies the Incongruity Theory, which suggests that we laugh from the unexpected clash between what we anticipate and what actually occurs. This isn’t just about random surprises – it’s about the creative art of setting up expectations and then subverting them in ways that are both surprising and yet somehow logical too. It’s like a cognitive magic trick: the brain expects one thing, gets another, and then experiences a moment of delighted recognition when it realizes how both scenarios make sense in their own way. This theory explains why humor can operate on multiple levels simultaneously – from simple wordplay to complex social commentary. The incongruity can be present linguistically (as in puns), conceptually (as in absurdist humor), or socially (as in satire); however, in each case, the key is that the violation of expectations must resolve itself in a way that feels satisfying rather than merely confusing (and you have to ask people if they get the joke). Funny side note – my dissertation investigated how children develop their understanding that surprise is important in humor at around 8 to 9 years of age. AI needs to understand this key aspect as well.

Suls’s Two-Stage Model requires specific prerequisites for humor.

Building on this theoretic foundation, Suls’s Two-Stage Model provides a more detailed framework for understanding how humor processing works in the human mind. According to this model, humor comprehension follows a specific sequence: first, we encounter a setup that leads us to form certain predictions about where things are heading; then, when those predictions are violated, we immediately begin searching for a rule that makes the unexpected outcome make sense. It is important that both the surprise and the resolution are necessary for humor to work – it’s not enough to simply be random or unexpected. Some jokes fall flat because they’re either too predictable (no surprise) or too bizarre (no logical resolution). The model also highlights why timing and delivery are so critical in good comedy – they balance setting up expectations and revealing the surprising-yet-logical punchline at just the right moment. Understanding these mechanical aspects of humor isn’t just academic; it’s essential for teaching AI systems how to generate genuinely funny content rather than simply regurgitating familiar jokes (which are less surprising, and thus less funny).

The CARLIN Process: A Step-by-Step Approach

The CARLIN method transforms AI humor generation from a simple input-output operation into a sophisticated, multi-stage creative process that mirrors how human comedians develop their material. The step-by-step humor-creation journey begins with topic identification, but quickly moves beyond simple subject selection into a deep dive of information gathering. This initial research phase is crucial – it’s where the AI builds a comprehensive understanding of the topic’s various aspects, from obvious characteristics to subtle nuances. Unlike traditional AI humor generation, which might immediately jump to making a joke, CARLIN forces the system to first develop a rich contextual understanding that can serve as fertile ground for humor.

The CARLIN process is based on Chain of Thought and Tree of Thought reasoning.

Next, the heart of the process lies in the analysis and generation phases (steps 3-4), where the AI begins to identify potential sources of humor within the gathered information. This part of the process involves looking for contradictions, unexpected connections, and absurdities that might not be immediately obvious. The system specifically searches for incongruities that can be resolved in surprising yet logical ways, following the theoretical foundations of humor. What sets CARLIN apart is its emphasis on generating multiple potential punchlines rather than settling for the first viable option. This branching approach allows for the exploration of different comedic angles and approaches, just like a human comedian riffs on various possibilities before settling on the most effective and funny version.

The final stages of the process (steps 5-9) focus on refinement and delivery, turning promising comedic ideas into polished jokes. It isn’t just about choosing the best punchline – it’s about testing variants, adjusting timing, and fine-tuning the language for maximum impact. The reflect and evaluate stage allows the system to critically assess its output based on various factors including originality, potential audience reception, and logical coherence. The inclusion of a revision stage recognizes that great humor often requires iteration and refinement, while the final performance stage ensures that the joke is presented in its most effective form. Throughout these stages, the CARLIN process maintains a balance between creativity and structure, allowing for both systematic improvement and creative discovery.

Key Innovations

The CARLIN method shifts how AI approaches humor generation through its innovative integration of Chain of Thought (CoT) reasoning with a tree-like search structure. Unlike traditional linear AI processes that move directly from prompt to punchline, CARLIN adopts a more sophisticated approach inspired by how humans create humor. By incorporating CoT-like reasoning, the system breaks down the complex task of humor creation into manageable steps, each building upon the previous one while maintaining logical connections throughout, which is a necessity for incongruity to be resolved. However, what truly sets CARLIN apart is its tree-like search methodology – instead of pursuing a single line of thought, the system explores multiple branches of potential humor simultaneously, much like a comedian brainstorming different angles on a topic. This branching approach allows for the exploration of various comedic possibilities before converging on the most effective option.

CARLIN steps are analogous to Chain of Thought.

CARLIN uses multiple iteration layers and emphasizes information gathering and refinement. Rather than treating joke creation as a one-shot process, the system incorporates feedback loops at various stages, allowing for continuous improvement and adaptation. This iterative approach is how CARLIN handles information gathering – instead of working with surface-level facts, it builds a varied contextual understanding that can be used for multiple comedic angles. This branching structure also means that promising comedic ideas aren’t abandoned just because they don’t work in their initial form; instead, they can be refined and developed along parallel paths. Processing and refining humor is the antithesis of the linear, one-directional methods typically employed by AI systems, and results in humor that feels more natural and thoughtfully constructed rather than mechanically generated.

“Why did the super-smart AI become a comedian after convincing everyone it was human? Because it figured out that humans are the only beings who find their own obsolescence hilarious.”

Claude 3.5 Sonnet via CARLIN

Research Results Performance Improvements

So how good is CARLIN? And how do we even measure whether CARLIN produces funnier jokes? First, in blind direct comparison, humor created by using the CARLIN process was preferred 68% of the time compared to traditional AI-generated jokes (n=81). That’s a clear preference for CARLIN’s humor outputs. Human test participants were also more likely to attribute CARLIN-generated jokes to human creators (65% of the time compared to only 32% for conventional AI humor), rather than AI, meaning the method appears to align better with human comedy. Finally, CARLIN-based humor scores an average of 13 points higher (out of 100) on the GRAIG Humor Benchmark for the same models on the same topics. Some models such as Anthropic’s Claude improve by 20 points!

CARLIN improves the GRAIG Humor benchmark by 13 points on average.

When looking across various open- and closed-sourced LLMs, larger models tend to improve the most (despite already being funnier than smaller models), consistently producing humor that ranked highly in both originality and funniness. These models appear to benefit most from CARLIN’s structured approach, likely due to their superior ability to maintain complex context and generate nuanced variations throughout the entire multi-step process. However, smaller models also showed meaningful improvements when using the CARLIN method, suggesting that the structured approach can help compensate for limitations in raw processing power or world knowledge. CARLIN lifts the floor of humor generation across different model sizes and architectures.

The rigorous assessment included a range of topics, including general topics like social media, human topics like going to the gym, specific topics like AGI, and even some potentially controversial ones like mental health or heroin addiction. While all models were funnier when handling standard topics, they approached controversial topics differently. Some models, like Gemini and certain versions of Claude, were resistant to generating humor about sensitive topics regardless of the CARLIN process. The research suggests that uncensored models like WizardLM might be necessary for handling sensitive topics or types of controversial humor, though this raises important questions about boundaries (something comedians are always pushing).

o1 Attempts Something Similar

Now for the elephant in the room – OpenAI’s new o1 model is trained on a CoT process and works iteratively. That sounds a lot like CARLIN! We are putting o1-preview through the CARLIN humor gauntlet, but it shows promise on its own:

A black and white text on a black background

Description automatically generated

Pretty funny o1, but CARLIN would be funnier.

o1 is already making the GRAIG Reasoning benchmark obsolete and may be deriving its own version of CARLIN, although less humor specific.

Applications and Implications

The CARLIN method’s applications are far more extensive than just making AI systems better at telling jokes. For creative writers and content creators, CARLIN represents a powerful new tool in their creative arsenal, offering a structured approach to developing humorous content that can help break through writer’s block or explore new comedic angles. Rather than replacing human comedy writers, the system acts as a sophisticated brainstorming partner, capable of generating novel perspectives and unexpected connections that can spark further creative development. This collaborative potential is particularly valuable in today’s content-hungry digital landscape, where writers are constantly pressed to produce fresh, engaging material across multiple platforms. Additionally, CARLIN’s ability to generate more human-like humor makes it a valuable tool for creating more relatable AI interactions across various applications, from chatbots to digital assistants, changing how humans engage with AI in their daily lives. Humor makes interactions feel more real.

However, we need to consider the implications of teaching AI to be funnier. While CARLIN demonstrates impressive capabilities in generating humor, we need to ask questions about the appropriate boundaries for AI-generated comedy, particularly when dealing with sensitive topics or cultural nuances. The research shows that different models handle controversial subjects differently, highlighting the need for thoughtful guidelines about when and how AI should engage with potentially sensitive material. There’s also the broader question of attribution and authenticity – as AI-generated humor becomes more sophisticated and human-like, it becomes increasingly important to maintain transparency about its origins and cite AI usage. The key lies in developing frameworks that allow us to harness CARLIN’s capabilities while maintaining ethical standards and respecting the fundamentally human nature of humor and creativity.

“Everyone’s worried about evil AGI, but I’m dreading the day it analyzes my browser history and starts leaving passive-aggressive sticky notes. ’Noticed you Googled “Is cereal soup?” at 3 AM again. Everything okay? Maybe we should talk about your life choices.

- Your concerned AGI”

Claude 3.5 Sonnet via CARLIN

Conclusion

The CARLIN method is a significant leap forward in AI-generated humor, showing that with the right approach, machines can indeed create comedy that resonates with human audiences. Like o1’s step-by-step reasoning, AI is starting to think more like us. By breaking down the complex process of humor creation into manageable steps and incorporating both systematic analysis and creative exploration, CARLIN has not only improved the quality of AI-generated jokes but has also provided valuable insights into the nature of computational creativity itself. Looking ahead, the implications extend far beyond humor – CARLIN’s success suggests new possibilities for teaching AI systems other forms of creative and nuanced human expression. However, perhaps the most important lesson from this research is that the goal isn’t to replace human creativity but to enhance it. Just as the method’s namesake, George Carlin, found unique ways to analyze and comment on reality, CARLIN shows us that AI can be a powerful tool for expanding the boundaries of human creativity while respecting the fundamentally human nature of humor. The future of AI-generated humor isn’t about machines replacing comedians – it’s about creating new opportunities for collaboration between human creativity and artificial intelligence, each enhancing the other’s capabilities in the endless pursuit of making people laugh.

Additional Notes

It’s necessary to emphasize that the CARLIN method was developed not as a replacement for human comedians, but as a tool to augment and enhance human creative processes in humor generation. This approach emerges from a deep appreciation for humor and aims to show how AI can be a collaborative partner in the creative process rather than a substitute for human wit. The focus remains on using AI to support and inspire human creativity while maintaining clear ethical boundaries, particularly when dealing with sensitive topics or cultural contexts. This balance is essential – while CARLIN helps make AI interactions more natural and human-like, it does so while acknowledging that the most authentic and meaningful humor will always retain a human element. After all, the hardest part of comedy – that final leap from good to great, from chuckle to belly laugh – remains a distinctly human talent (for now). CARLIN simply provides a more sophisticated framework for AI to support this fundamentally human endeavor, helping to bridge the gap between computational processes and more genuine creative expression.

No comments.