if you had an agent that obviously did have goals (let’s say, a player in a game, whose goal is to win, and who plays the optimal strategy) could you deduce those goals from behavior alone?
Let’s say you’re studying the game of Connect Four, but you have no idea what constitutes “winning” or “losing.” You watch enough games that you can map out a game tree. In state X of the world, a player chooses option A over other possible options, and so on. From that game tree, can you deduce that the goal of the game was to get four pieces in a row?
I don’t know the answer to this question. But it seems important. If it’s possible to identify, given a set of behaviors, what goal they’re aimed at, then we can test behaviors (human, animal, algorithmic) for hidden goals. If it’s not possible, that’s very important as well; because that means that even in a simple game, where we know by construction that the players are “rational” goal-maximizing agents, we can’t detect what their goals are from their behavior.
That would mean that behaviors that “seem” goal-less, programs that have no line of code representing a goal, may in fact be behaving in a way that corresponds to maximizing the likelihood of some event; we just can’t deduce what that “goal” is. In other words, it’s not as simple as saying “That program doesn’t have a line of code representing a goal.” Its behavior may encode a goal indirectly. Detecting such goals seems like a problem we would really want to solve.
From that game tree, can you deduce that the goal of the game was to get four pieces in a row?
One method that would work for this example is to iterate over all possible goals in ascending complexity, and check which one would generate that game tree. How to apply this idea to humans is unclear. See here for a previous discussion.
Ok, computationally awful for anything complicated, but possible in principle for simple games. That’s good, though; that means goals aren’t truly invisible, just inconvenient to deduce.
I think, actually, because we hardly ever play with optimal strategy goals are going to be nigh impossible to deduce. Would such a end-from-means deduction even work if the actor was not using the optimal strategy? Because humans only do so in games on the level of tic-tac-toe (the more rational ones maybe in more complex situations, but not by much), and as for machines that could utilize optimal strategy, we’ve just excluded them from even having such ‘goals’.
If each game is played to the end (no resignations, at least in the sample set) then presumably you could make good initial guesses about the victory condition by looking at common factors in the final positions. A bit like zendo. It wouldn’t solve the problem, but it doesn’t rely on optimal play, and would narrow the solution space quite a bit.
e.g. in the connect-four example, all final moves create a sequence of four or more in a row. Armed with that hypothesis, you look at the game tree, and note that all non-final moves don’t. So you know (with reasonably high confidence) that making four in a row ends the game. How to figure out whether it wins the game or loses it is an exercise for the reader.
(mental note, try playing C4 with the win condition reversed and see if it makes for an interesting game.)
there’s always heuristics, for example seeing that the goal of making three in a row fits the game tree well suggests considering goals of the form “make n in a row” or at least “make diagonal and orthogonal versions of some shape”
Human games (of the explicit recreational kind) tend to have stopping rules isomorphic with the game’s victory conditions. We would typically refer to those victory conditions as the objective of the game, and the goal of the participants. Given a complete decision tree for a game, even a messy stochastic one like Canasta, it seems possible to deduce the conditions necessary for the game to end.
An algorithm that doesn’t stop (such as the blue-minimising robot) can’t have anything analogous to the victory condition of a game. In that sense, its goals can’t be analysed in the same way as those of a Connect Four-playing agent.
So if the blue-minimising robot was to stop after 3 months (the stop condition is measured by a timer), can we say that the robot’s goal is to stay “alive” for 3 months? I cannot see a necessry link between deducing goals and stopping conditions.
A “victory condition” is another thing, but from a decision tree, can you deduce who loses (for Connect Four, perhaps it is the one who reaches the first four that loses).
By “victory condition”, I mean a condition which, when met, determines the winning, losing and drawing status of all players in the game. A stopping rule is necessary for a victory condition (it’s the point at which it is finally appraised), but it doesn’t create a victory condition, any more than imposing a fixed stopping time on any activity creates winners and losers in that activity.
Just to underscore a broader point: recreational games have various characteristics which don’t generalise to all situations modelled game-theoretically. Most importantly, they’re designed to be fun for humans to play, to have consistent and explicit rules, to finish in a finite amount of time (RISK notwithstanding), to follow some sort of narrative and to have means of unambiguously identifying winners.
Anecdotally, if you’re familiar with recreational games, it’s fairly straightforward to identify victory conditions in games just by watching them being played, because their conventions mean those conditions are drawn from a considerably reduced number of possibilities. There are, however, lots of edge- and corner-cases where this probably isn’t possible without taking a large sample of observations.
Well, even if we have conditions to end game we still don’t know if player’s goal is to end the game (poker) or to avoid ending it for as long as possible (Jenga). We can try to deduce it empirically (if it’s possible to end game on first turn effortlesly, then goal is to keep going), but I’m not sure if it applies to all games.
I mean it could not be visible from a game log (for complex games). We will see the combination of pieces when game ends (ending condition), but it can be not enough.
“Victory conditions” in the context I’m using are the conditions that need to be met in order for the game to end, not simply the state of play at the point when any given game ends.
I suspect that “has goals” is ultimately a model, rather than a fact. To the extent that an agent’s behavior maximizes a particular function, that agent can be usefully modeled as an optimizer. To the extent that an agent’s behavior exhibits signs of poor strategy, such as vulnerability to dutch books, that agent may be better modeled as an algorithm-executer.
This suggests that “agentiness” is strongly tied to whether we are smart enough to win against it.
I suspect that “has goals” is ultimately a model, rather than a fact. To the extent that an agent’s behavior maximizes a particular function, that agent can be usefully modeled as an optimizer. To the extent that an agent’s behavior exhibits signs of poor strategy, such as vulnerability to dutch books, that agent may be better modeled as an algorithm-executer.
This suggests that “agentiness” is strongly tied to whether we are smart enough to win against it.
This principle is related to (a component of) the thing referred to as ‘objectified’. That is, if a person is aware that another person can model it as an algorithm-executor then it may consider itself objectified.
What I’ve heard is that, for an intelligent entity, it’s easier to predict what will happen based on their goals rather than what they do.
For example, with the connect four game, if you manage to figure out that they always seem to get four in a row, and you never do when you play against them, before you can figure out what their strategy is, you know their goal.
Compare with only ever seeing one move made in such a game, but being able to inspect in detail the reasons that played a role in deciding what move to make, looking for explanations for that move. It seems that even one move might suffice, which goes to show that it’s unnecessary for behavior itself to somehow encode agent’s goals, as we can also take into account the reasons for the behavior being so and so.
If you had lots of end states, and lots of non-end states, and we want to assume the game ends when someone’s won, and
that a player only moves into an end state if he’s won (neither of these last two are necessarily true even in nice pretty games), then you could treat it like a classification problem. In that case, you could throw your favourite classifier learning algorithm at it. I can’t think of any publications on someone machine learning a winning condition, but that doesn’t mean it’s not out there.
Dr. David Silver used temporal difference learning to learn some important spatial patterns for Go play, using self-play. Self play is basically like watching yourself play lots of games with another copy of yourself, so I can imagine similar ideas being used to watching someone else play. If you’re interested in that, I suggest http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-170.pdf
On a sadly less published (and therefore mostly unreliable) but slightly more related note, we did have a project once in which we were trying to teach bots to play a Mortal Kombat style game only by observing logs of human play. We didn’t tell one of the bots the goal, we just told it when someone had won, and who had won. It seemed to get along ok.
One of my 30 or so Friendliness-themed thought experiments is called “Implicit goals of ArgMax” or something like that. In general I think this style of reasoning is very important for accurately thinking about universal AI drives. Specifically it is important to analyze highly precise AI architectures like Goedel machines where there’s little wiggle room for a deus ex machina.
I wonder:
if you had an agent that obviously did have goals (let’s say, a player in a game, whose goal is to win, and who plays the optimal strategy) could you deduce those goals from behavior alone?
Let’s say you’re studying the game of Connect Four, but you have no idea what constitutes “winning” or “losing.” You watch enough games that you can map out a game tree. In state X of the world, a player chooses option A over other possible options, and so on. From that game tree, can you deduce that the goal of the game was to get four pieces in a row?
I don’t know the answer to this question. But it seems important. If it’s possible to identify, given a set of behaviors, what goal they’re aimed at, then we can test behaviors (human, animal, algorithmic) for hidden goals. If it’s not possible, that’s very important as well; because that means that even in a simple game, where we know by construction that the players are “rational” goal-maximizing agents, we can’t detect what their goals are from their behavior.
That would mean that behaviors that “seem” goal-less, programs that have no line of code representing a goal, may in fact be behaving in a way that corresponds to maximizing the likelihood of some event; we just can’t deduce what that “goal” is. In other words, it’s not as simple as saying “That program doesn’t have a line of code representing a goal.” Its behavior may encode a goal indirectly. Detecting such goals seems like a problem we would really want to solve.
One method that would work for this example is to iterate over all possible goals in ascending complexity, and check which one would generate that game tree. How to apply this idea to humans is unclear. See here for a previous discussion.
Ok, computationally awful for anything complicated, but possible in principle for simple games. That’s good, though; that means goals aren’t truly invisible, just inconvenient to deduce.
I think, actually, because we hardly ever play with optimal strategy goals are going to be nigh impossible to deduce. Would such a end-from-means deduction even work if the actor was not using the optimal strategy? Because humans only do so in games on the level of tic-tac-toe (the more rational ones maybe in more complex situations, but not by much), and as for machines that could utilize optimal strategy, we’ve just excluded them from even having such ‘goals’.
If each game is played to the end (no resignations, at least in the sample set) then presumably you could make good initial guesses about the victory condition by looking at common factors in the final positions. A bit like zendo. It wouldn’t solve the problem, but it doesn’t rely on optimal play, and would narrow the solution space quite a bit.
e.g. in the connect-four example, all final moves create a sequence of four or more in a row. Armed with that hypothesis, you look at the game tree, and note that all non-final moves don’t. So you know (with reasonably high confidence) that making four in a row ends the game. How to figure out whether it wins the game or loses it is an exercise for the reader.
(mental note, try playing C4 with the win condition reversed and see if it makes for an interesting game.)
there’s always heuristics, for example seeing that the goal of making three in a row fits the game tree well suggests considering goals of the form “make n in a row” or at least “make diagonal and orthogonal versions of some shape”
Human games (of the explicit recreational kind) tend to have stopping rules isomorphic with the game’s victory conditions. We would typically refer to those victory conditions as the objective of the game, and the goal of the participants. Given a complete decision tree for a game, even a messy stochastic one like Canasta, it seems possible to deduce the conditions necessary for the game to end.
An algorithm that doesn’t stop (such as the blue-minimising robot) can’t have anything analogous to the victory condition of a game. In that sense, its goals can’t be analysed in the same way as those of a Connect Four-playing agent.
So if the blue-minimising robot was to stop after 3 months (the stop condition is measured by a timer), can we say that the robot’s goal is to stay “alive” for 3 months? I cannot see a necessry link between deducing goals and stopping conditions.
A “victory condition” is another thing, but from a decision tree, can you deduce who loses (for Connect Four, perhaps it is the one who reaches the first four that loses).
By “victory condition”, I mean a condition which, when met, determines the winning, losing and drawing status of all players in the game. A stopping rule is necessary for a victory condition (it’s the point at which it is finally appraised), but it doesn’t create a victory condition, any more than imposing a fixed stopping time on any activity creates winners and losers in that activity.
Can we know the victory condition from just watching the game?
Just to underscore a broader point: recreational games have various characteristics which don’t generalise to all situations modelled game-theoretically. Most importantly, they’re designed to be fun for humans to play, to have consistent and explicit rules, to finish in a finite amount of time (RISK notwithstanding), to follow some sort of narrative and to have means of unambiguously identifying winners.
Anecdotally, if you’re familiar with recreational games, it’s fairly straightforward to identify victory conditions in games just by watching them being played, because their conventions mean those conditions are drawn from a considerably reduced number of possibilities. There are, however, lots of edge- and corner-cases where this probably isn’t possible without taking a large sample of observations.
Well, even if we have conditions to end game we still don’t know if player’s goal is to end the game (poker) or to avoid ending it for as long as possible (Jenga). We can try to deduce it empirically (if it’s possible to end game on first turn effortlesly, then goal is to keep going), but I’m not sure if it applies to all games.
If ending the game quickly or slowly is part of the objective, in what way is it not included in the victory conditions?
I mean it could not be visible from a game log (for complex games). We will see the combination of pieces when game ends (ending condition), but it can be not enough.
I don’t think we’re talking about the same things here.
A decision tree is an optimal path through all possible decision in a game, not just the history of any given game.
“Victory conditions” in the context I’m using are the conditions that need to be met in order for the game to end, not simply the state of play at the point when any given game ends.
I suspect that “has goals” is ultimately a model, rather than a fact. To the extent that an agent’s behavior maximizes a particular function, that agent can be usefully modeled as an optimizer. To the extent that an agent’s behavior exhibits signs of poor strategy, such as vulnerability to dutch books, that agent may be better modeled as an algorithm-executer.
This suggests that “agentiness” is strongly tied to whether we are smart enough to win against it.
This principle is related to (a component of) the thing referred to as ‘objectified’. That is, if a person is aware that another person can model it as an algorithm-executor then it may consider itself objectified.
What I’ve heard is that, for an intelligent entity, it’s easier to predict what will happen based on their goals rather than what they do.
For example, with the connect four game, if you manage to figure out that they always seem to get four in a row, and you never do when you play against them, before you can figure out what their strategy is, you know their goal.
Although you might have just identified an instrumental subgoal.
Compare with only ever seeing one move made in such a game, but being able to inspect in detail the reasons that played a role in deciding what move to make, looking for explanations for that move. It seems that even one move might suffice, which goes to show that it’s unnecessary for behavior itself to somehow encode agent’s goals, as we can also take into account the reasons for the behavior being so and so.
If you had lots of end states, and lots of non-end states, and we want to assume the game ends when someone’s won, and that a player only moves into an end state if he’s won (neither of these last two are necessarily true even in nice pretty games), then you could treat it like a classification problem. In that case, you could throw your favourite classifier learning algorithm at it. I can’t think of any publications on someone machine learning a winning condition, but that doesn’t mean it’s not out there.
Dr. David Silver used temporal difference learning to learn some important spatial patterns for Go play, using self-play. Self play is basically like watching yourself play lots of games with another copy of yourself, so I can imagine similar ideas being used to watching someone else play. If you’re interested in that, I suggest http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-170.pdf
On a sadly less published (and therefore mostly unreliable) but slightly more related note, we did have a project once in which we were trying to teach bots to play a Mortal Kombat style game only by observing logs of human play. We didn’t tell one of the bots the goal, we just told it when someone had won, and who had won. It seemed to get along ok.
One of my 30 or so Friendliness-themed thought experiments is called “Implicit goals of ArgMax” or something like that. In general I think this style of reasoning is very important for accurately thinking about universal AI drives. Specifically it is important to analyze highly precise AI architectures like Goedel machines where there’s little wiggle room for a deus ex machina.