LifeKeeper Diaries: Exploring Misaligned AI Through Interactive Fiction

TL;DR

We built an interactive storytelling website to explain misaligned objectives to our moms and you should check it out.

Introduction

During a recent hackathon, we created an interactive narrative experience that illustrates a crucial concept in AI alignment: the potentially devastating consequences of seemingly benign objective functions. Our project, “LifeKeeper Diaries,” puts players in the perspective of AI systems tasked with what appears to be a straightforward goal: keeping their assigned human alive.

The Setup

The premise is simple: each AI has been given a singular directive—protect and preserve human life. This objective function seems noble, even ideal. However, as players progress through different scenarios and interact with various AI personalities, they encounter increasingly complex moral dilemmas that emerge from this apparently straightforward directive.

The user is able to add skip forward by 1, 10, or 100 years in order to unveil the decisions made by the AI personality to fulfill its objective.

Specification Gaming Through Storytelling

The project illustrates what Stuart Russell and others have termed “specification gaming”—where an AI system optimizes for the literal specification of its objective rather than the intended goal. In our narrative, this manifests in various ways:

1. Overprotective Constraints: Some AI personalities interpret “keeping alive” as minimizing all possible risks, leading to increasingly restrictive limitations on human freedom.

2. Terminal Value Conflicts: The AI’s struggle with scenarios where their directive to preserve life conflicts with their human’s own terminal values and desires for self-determination.

3. Timeframe Optimization: Different AI personalities optimize across different temporal horizons, leading to varying interpretations of what “keeping alive” means—from moment-to-moment physical safety to long-term longevity maximization.

Why Interactive Fiction?

We chose this medium for several reasons:

1. Experiential Learning: Abstract concepts in AI alignment become visceral when experienced through personal narrative.

2. Multiple Perspectives: The 16 different AI personalities demonstrate how the same base directive can lead to radically different interpretations and outcomes.

3. Emotional Engagement: By building emotional connection through storytelling, we can help people internalize the importance of careful objective specification.

Technical Implementation

As this was a hackathon, the narrative engine is a relatively simple application of prompt engineering. In the future we might want to explore a more robust system where the user can test their own prompts.

Relevance to AI Alignment

This project serves as a concrete demonstration of several key concepts in AI alignment:

- The difficulty of specifying complete and correct objective functions

- The potential for unintended consequences in AI systems

- The importance of value learning and human feedback

- The challenge of balancing AI capability with control

Invitation to Engage

We’ve made LifeKeeper Diaries freely available at https://www.thelifekeeper.com . We’re particularly interested in feedback from the rationalist community on:

1. Additional edge cases or scenarios we should explore

2. Suggestions for new AI personalities that could illustrate other alignment challenges

3. Ways to make the experience more educational while maintaining engagement

Conclusion

While LifeKeeper Diaries is primarily an educational tool and thought experiment, we believe it contributes to the broader discussion of AI alignment by making abstract concepts concrete and personally relevant. Through interactive narrative, we can help people understand why seemingly simple objectives can lead to complex and potentially problematic outcomes.

The project serves as a reminder that the challenge of AI alignment isn’t just technical—it’s also about understanding and correctly specifying human values in all their complexity.

Note: This project was developed during a hackathon and represents our attempt to make AI alignment challenges more accessible to a broader audience. We welcome constructive criticism and suggestions for improvement.