When does external behaviour imply interal structure?

I’ve been working on an AI safety camp project where we try to describe agent structure. This post defines some key concepts and conveys my reasoning about this topic so far. It is mostly conceptual. The first section discusses what structure is and what it means for an object’s behavior to imply structure. The second part shows ways of thinking about program structure. Then, we discuss different types of agent structures.

Basic structure

Consider the following sequence of numbers

1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211, 31131211131221

Even if you don’t know the function that generates these numbers, you can still find some patterns in the numbers if you look long enough. For example, you can see that the numbers always get bigger and end with a 1. This doesn’t fully describe the sequence, but it describes some structure in the numbers. (if you are curious, this is the look and say sequence)

Let’s dive into the structure by doing a simple conceptual analysis. Structure is a property of entities. The structure of an entity is a logical or mathematical object that partially describes that entity. An entity may have many different types of structures.

Examples of things with structure

the sequence 01010101 has a repetitive structure
The US interstate system has a graph structure
Monopoly has a game-theoretic structure

Structures are built on top of each other. For example, rings have a group structure, which has a set structure.

Determining structure

Let’s say we have some object $O$ and want to show that it has some structure. $O$ might have some already known structure, like being a Turing machine or a Python program.

To try to determine $O$ ’s structure, we will test its behavior by asking it questions and examining the responses. Take a set as an example. We can ask, “Is this element in the set?” or “How many elements are in the set?” When we ask enough questions, we can conclude that the set has a certain structure, like “only contains even numbers.”

I like to call this “carving” out structure instead of building it up. We start with the prior that this set is a random one from the set of all sets, and then by asking questions, we narrow down the possible objects this set could be until we conclude it is a particular one or a particular set of sets that share some structure.

Program structure

We will now consider the structure of programs; this is especially useful when considering agents since agents can be thought of as programs.

Consider these two Python programs

def quick_sort(arr):
	if len(arr) <= 1:
		return arr
	pivot = arr[len(arr) // 2]
	left = [x for x in arr if x < pivot]
	middle = [x for x in arr if x == pivot]
	right = [x for x in arr if x > pivot]
	return quick_sort(left) + middle + quick_sort(right)

def insertion_sort(arr):
	for i in range(1, len(arr)):
		key = arr[i]
		j = i - 1
		while j >= 0 and key < arr[j]:
			arr[j + 1] = arr[j]
			j -= 1
		arr[j + 1] = key
	return arr

These programs are behaviorally identical but structurally different. They both compute the same function, mapping unsorted arrays to sorted arrays, but they do it differently.

Let’s examine some key ways their structure is different. The quick sort program uses recursion, while the insertion sort program has a doubly nested loop. The quick sort program makes new arrays, while the insertion sort program swaps elements in place. We will call these features programmatic structure.

Now, consider this implementation of insertion sort.

def recursive_insertion_sort(arr, n=None):
	if n is None:
		n = len(arr)
	if n <= 1:
		return
	recursive_insertion_sort(arr, n-1)
	last = arr[n-1]
	j = n-2
	while (j >= 0 and arr[j] > last):
		arr[j + 1] = arr[j]
		j -= 1
arr[j + 1] = last

This program has a very different structure compared to the previous insertion sort. It is now recursive and has conditionals that weren’t there before. Is this still insertion sort? I’d argue that it is. While the two programs have different program structures, they still seemed to have some structure in common.

To determine this underlying structure, perhaps we could define a formal language to which we could convert every program that implements the sorting function into. In this language, there will be a single “Insertion sort program” and a single “Quick sort program.” If a sorting program maps to the insertion sort program, it implements insertion sort.

To determine the internal program structure, we could perhaps define a process that converts python programs that implement sort into a stricter program in a formal language such that all python programs that we think implement quick sort get mapped to the formal quick sort (QUICK_SORT). This process might be uncomputable

What properties does a program with the QUICK_SORT structure have? The properties of QUICK_SORT must be program-invariant. That is, all programs that we think implement quick sort run in $O (n log n)$ time and use $O (n)$ space thus QUICK_SORT should also have these structures. Another candidate for a program invariant structure could be the ordering of comparison and swap operations in programs. I’d like to investigate these properties more in the future and attempt to define this process

Implying program structure

Now, we want to infer some structure of an unknown program P that we are handed.

Suppose we observe the following things about $P$

$P$ implements a sorting function
$P$ has a runtime of $O (n log n)$ and a space complexity of $O (n)$
$P$ has the same comparison/swap operation ordering as QUICK_SORT

These are all necessary but not sufficient conditions for an arbitrary program to be QUICK_SORT. How do we know that their doesn’t exist some other sorting program that has the same comp/swap ordering of QUICK_SORT but doesn’t have some necessary property of quick sort like the notion of a “pivot”.

What ever structural properties we are able to prove that aren’t obvious from our observations would be dependent on the process that we derive that maps sorting programs to formal programs. If it is easy to show that the process converts all programs with the characteristics above to QUICK_SORT, then it might become tautological.

Agent structure

Agents are programs of a special types. Agents are ran in an environment. They have state, take in a percept and output an action. Below I lay out a basic interface for looking at how agents and environment are ran.

interface Agent {
	getAction: (p: Percept) => Action
}

interface Environment {
	getPercept: (a: Action) => Percept
}

function run(agent: Agent, env: Environmet) {
	percept = null
	action = null
	while (true) {
		action = agent.getAction(percept)
		percept = env.getPercept(action)
	}
}

Of course, you could also run the agents for a finite amount of time.

In Artificial Intelligence: A Modern Approach 4th the authors define multiple “agent structures”. We borrow some of their terminology and concepts here.

Table based agent

A table-based agent is one that looks up the action to take based on all its past precepts. They are “hard-coded” agents. In most environments, table-based agents use infinite memory.

class TableAgent {
	actionTable: Map<Percept[], Action>
	percepts: Percept[] = []
	getAction(p: Percept) {
		this.percepts.push(p)
		return this.actionTable.get(this.percepts)
	}
}

Reflex-based agent

Reflexive agents take their current input, compute some state, and then see if that state triggers any rules. We break up the agent’s getAction function into two parts. One function that interprets the input as some world state. Then, a function that reads that world state checks if certain conditions are met to determine what action to take. Note that the input interpreter could be the identity function, which would be distilled into any program that takes the current percept as input and output actions. Breaking it up like this allows for more flexible agents to be designed.

class ReflextAgent<State> {
	inputInterpreter: (p: Percept) => State
	rulesFunction: (s: State) => Action
	getAction(p: Percept) {
		const world_state = this.inputInterpreter(p)
		return this.rulesFunction(s)
	}
}

Model-based agent

Model based agents have internal state that tracks the environment. After each observation, they update their model of the world and output actions based on that world state. This agent can track things about the environment that are not currently in its precept.

class ModelBasedAgent<State> {
	state: State
	action: Action | null = null
	rulesFunction: (s: State) => Action
	updateState: (s: State, a: Action, p: Percept) => State

	getAction(percept: Percept): Action {
		this.state = this.updateState(this.state, this.action, percept);
		const action = this.rulesFunction(this.state);
		this.action = action; 
		return action;
	}
}

Table based agents and model based agents with infinite memory and compute can both implement the same functions. A table based agent is the rolled out version of any agent. Reflex based agents might not be able to perform the same since some environment might require you to remember your last percepts.

Implying agent structure

Let’s now discuss the problem of detecting structure by observing the behavior of the program. Let’s say we have some program $p$ that we want to test for agent structure. Let’s start by coming up with questions we could ask that differentiate between the table-based agent and other types of agent structures. Let’s say we define an environment that, at each time step, outputs a simple addition problem like 234+550, and the agent is supposed to output the correct answer 784. We will limit the length of each number to 10 digits. While the look-up table agent could encode this, it would use a table with 10^20 entries. A simple reflex agent could accomplish this by performing an addition algorithm that uses much less space. So, depending on our problem, knowing the length of a program will inform you that it is not a table-based agent.

Now, we want to differentiate the reflex-based agent from the model-based agent. This can be accomplished by utilizing the fact the reflex agent does not have memory. If $p$ is run inside of an environment where it must repeat the last number it was given, then the reflex agent will not perform well in this environment. This is hard to solve for a general environment, though. Imagine an environment where the agent plays Mario Kart. If an agent wins, it isn’t clear if they were modeling their environment by tracking where the other players are planning for the future or if they were just responding to reflexes.

The scary structure

[Find links fro X-risk concerns of utility maximizing agents]

The things that we are really scared of are utility-based agents. They are a subset of model-based agents that try to optimize utility functions. Their rulesFunction is a complex procedure that searches over plans to see which one results in the most utility for the agent. Although we don’t know how to formalize these types of agents completely, they seem scary since they might generalize in ways we don’t want and do really wacky things to maximize utility, like turning the world’s oceans into paperclips.

Let’s consider this environment:

type State = number
type Action = Button1 | Button2
type Percept = State

class ButtonEnvironment {
	state: State
	correctButton: Action
	getPercept(action: Action): Percept {
		if (action == this.correctButton) {
			this.state++
		} else {
			this.state--
		}
		return this.state
	}
}

We have two free parameters: the starting state and the button that moves us up the states. Now, let’s say we run some unknown program through many different instances of this environment. Sometimes, Button2 makes you go up, sometimes, Button1 does. Sometimes, you start at −100, sometimes 3.

There exists some agents that always move the environment to a particular state. That is, no matter the starting parameters of the environment, after some time, the environment will head to a certain state. We say that the agent is optimizing this environment. They are predictably moving the environment to a state that they desire.

What can we say about agents that do this? Given a reasonable size constraint, we know they must be model-based since they must remember which button does what. However, after they gather that information, they could be a reflex agent since their next actions can be predicted solely on the current precept. (If the world state is less than my desired state, then press the button that I’ve learned makes the state increase; otherwise, press the other button)

Conclusion & Future work

This post mostly breaks down concepts and give example of how behavior implies structure. The ultimate goal is to come up with a set of things to look for that implies that a model is structured as a scary utility maximizing agent. While this seems hard, I think it is tractable. Future work will include discussing the modularity of structure and how an agent could be “half” a reflex based agent.