Game Theory
Game theory is the study of the ways in which strategic interactions among rational players produce outcomes with respect to the preferences (or utilities) of those players, none of which might have been intended by any of them. The meaning of this statement will not be clear to the non-expert until each of the italicized words and phrases has been explained and featured in some examples. Doing this will be the main business of this article. First, however, we provide some historical and philosophical context in order to motivate the reader for all of this technical work ahead.
- 1. Philosophical and Historical Motivation
- 2. Basic Elements and Assumptions of Game Theory
- 2.1 Utility
- 2.2 Games and Information
- 2.3 Trees and Matrices
- 2.4 The Prisoner's Dilemma as an Example of Strategic-Form vs. Extensive-Form Representation
- 2.5 Solution Concepts and Equilibria
- 2.6 Modular Rationality and Subgame Perfection
- 2.7 On Interpreting Payoffs: Morality and Efficiency in Games
- 2.8 Trembling Hands
- 3. Uncertainty, Risk and Sequential Equilibria
- 4. Repeated Games and Coordination
- 5. Commitment
- 6. Evolutionary Game Theory
- 7. Game Theory and Behavioral Evidence
- Bibliography
- Other Internet Resources
- Related Entries
1. Philosophical and Historical Motivation
The mathematical theory of games was invented by John von Neumann and Oskar Morgenstern (1944). For reasons to be discussed later, limitations in their mathematical framework initially made the theory applicable only under special and limited conditions. This situation has gradually changed, in ways we will examine as we go along, over the past six decades, as the framework was deepened and generalized. Refinements are still being made, and we will review a few outstanding philosophical problems that lie along the advancing front edge of these developments towards the end of the article. However, since at least the late 1970s it has been possible to say with confidence that game theory is the most important and useful tool in the analyst's kit whenever she confronts situations in which what counts as one agent's best action (for her) depends on expectations about what one or more other agents will do, and what counts as their best actions (for them) similarly depend on expectations about her.
Despite the fact that game theory has been rendered mathematically and logically systematic only recently, however, game-theoretic insights can be found among philosophers and political commentators going back to ancient times. For example, in two of Plato's texts, the Laches and the Symposium, Socrates recalls an episode from the Battle of Delium that involved the following situation. Consider a soldier at the front, waiting with his comrades to repulse an enemy attack. It may occur to him that if the defense is likely to be successful, then it isn't very probable that his own personal contribution will be essential. But if he stays, he runs the risk of being killed or wounded—apparently for no point. On the other hand, if the enemy is going to win the battle, then his chances of death or injury are higher still, and now quite clearly to no point, since the line will be overwhelmed anyway. Based on this reasoning, it would appear that the soldier is better off running away regardless of who is going to win the battle. Of course, if all of the soldiers reason this way—as they all apparently should, since they're all in identical situations—then this will certainly bring about the outcome in which the battle is lost. Of course, this point, since it has occurred to us as analysts, can occur to the soldiers too. Does this give them a reason for staying at their posts? Just the contrary: the greater the soldiers' fear that the battle will be lost, the greater their incentive to get themselves out of harm's way. And the greater the soldiers' belief that the battle will be won, without the need of any particular individual's contributions, the less reason they have to stay and fight. If each soldier anticipates this sort of reasoning on the part of the others, all will quickly reason themselves into a panic, and their horrified commander will have a rout on his hands before the enemy has even fired a shot.
Long before game theory had come along to show people how to think about this sort of problem systematically, it had occurred to some actual military leaders and influenced their strategies. Thus the Spanish conqueror Cortez, when landing in Mexico with a small force who had good reason to fear their capacity to repel attack from the far more numerous Aztecs, removed the risk that his troops might think their way into a retreat by burning the ships on which they had landed. With retreat having thus been rendered physically impossible, the Spanish soldiers had no better course of action but to stand and fight—and, furthermore, to fight with as much determination as they could muster. Better still, from Cortez's point of view, his action had a discouraging effect on the motivation of the Aztecs. He took care to burn his ships very visibly, so that the Aztecs would be sure to see what he had done. They then reasoned as follows: Any commander who could be so confident as to willfully destroy his own option to be prudent if the battle went badly for him must have good reasons for such extreme optimism. It cannot be wise to attack an opponent who has a good reason (whatever, exactly, it might be) for being sure that he can't lose. The Aztecs therefore retreated into the surrounding hills, and Cortez had his victory bloodlessly.
These situations as recalled by Plato and as vividly acted upon by Cortez have a common and interesting underlying logic. Notice that the soldiers are not motivated to retreat just, or even mainly, by their rational assessment of the dangers of battle and by their self-interest. Rather, they discover a sound reason to run away by realizing that what it makes sense for them to do depends on what it will make sense for others to do, and that all of the others can notice this too. Even a quite brave soldier may prefer to run rather than heroically, but pointlessly, die trying to stem the oncoming tide all by himself. Thus we could imagine, without contradiction, a circumstance in which an army, all of whose members are brave, flees at top speed before the enemy makes a move. If the soldiers really are brave, then this surely isn't the outcome any of them wanted; each would have preferred that all stand and fight. What we have here, then, is a case in which the interaction of many individually rational decision-making processes—one process per soldier—produces an outcome intended by no one. (Most armies try to avoid this problem just as Cortez did. Since they can't usually make retreat physically impossible, they make it economically impossible: they shoot deserters. Then standing and fighting is each soldier's individually rational course of action after all, because the cost of running is sure to be at least as high as the cost of staying.)
Another classic source that invites this sequence of reasoning is found in Shakespeare's Henry V. During the Battle of Agincourt Henry decided to slaughter his French prisoners, in full view of the enemy and to the surprise of his subordinates, who describe the action as being out of moral character. The reasons Henry gives allude to parametric considerations: he is afraid that the prisoners may free themselves and threaten his position. However, a game theorist might have furnished him with supplementary strategic (and similarly prudential, though perhaps not moral) justification. His own troops observe that the prisoners have been killed, and observe that the enemy has observed this. Therefore, they know what fate will await them at the enemy's hand if they don't win. Metaphorically, but very effectively, their boats have been burnt. The slaughter of the prisoners plausibly sent a signal to the soldiers of both sides, thereby changing their incentives in ways that favoured English prospects for victory.
These examples might seem to be relevant only for those who find themselves in sordid situations of cut-throat competition. Perhaps, one might think, it is important for generals, politicians, businesspeople and others whose jobs involve manipulation of others, but the philosopher should only deplore its horrid morality. Such a conclusion would be highly premature, however. The study of the logic that governs the interrelationships amongst incentives, strategic interactions and outcomes has been fundamental in modern political philosophy, since centuries before anyone had an explicit name for this sort of logic.
Hobbes's Leviathan is often regarded as the founding work in modern political philosophy, the text that began the continuing round of analyses of the function and justification of the state and its restrictions on individual liberties. The core of Hobbes's reasoning can be given quite straightforwardly as follows. The best situation for all people is one in which each is free to do as she pleases. Often, such free people will wish to cooperate with one another in order to carry out projects that would be impossible for an individual acting alone. But if there are any immoral or amoral agents around, they will notice that their interests are best served by getting the benefits from cooperation and not returning them. Suppose, for example, that you agree to help me build my house in return for my promise to help you build yours. After my house is finished, I can make your labour free to me simply by reneging on my promise. I then realize, however, that if this leaves you with no house, you will have an incentive to take mine. This will put me in constant fear of you, and force me to spend valuable time and resources guarding myself against you. I can best minimize these costs by striking first and killing you at the first opportunity. Of course, you can anticipate all of this reasoning by me, and so have good reason to try to beat me to the punch. Since I can anticipate this reasoning by you, my original fear of you was not paranoid; nor was yours of me. In fact, neither of us actually needs to be immoral to get this chain of mutual reasoning going; we need only think that there is some possibility that the other might try to cheat on bargains. Once a small wedge of doubt enters any one mind, the incentive induced by fear of the consequences of being preempted—hit before hitting first—quickly becomes overwhelming on both sides. If either of us has any resources of our own that the other might want, this murderous logic will take hold long before we are so silly as to imagine that we could ever actually get so far as making deals to help one another build houses in the first place. Left to their own devices, rational agents will never derive the benefits of cooperation, and will instead live from the outset in a state of ‘war of all against all’, in Hobbes's words. In these circumstances, all human life, as he vividly and famously put it, will be "solitary, poor, nasty, brutish and short."
Hobbes's proposed solution to this problem was tyranny. The people can hire an agent—a government—whose job is to punish anyone who breaks any promise. So long as the threatened punishment is sufficiently dire—Hobbes thought decapitation generally appropriate—then the cost of reneging on promises will exceed the cost of keeping them. The logic here is identical to that used by an army when it threatens to shoot deserters. If all people know that these incentives hold for most others, then cooperation will not only be possible, but will be the expected norm, and the war of all against all becomes a general peace.
Hobbes pushes the logic of this argument to a very strong conclusion, arguing that it implies not only a government with the right and the power to enforce cooperation, but an ‘undivided’ government in which the arbitrary will of a single ruler must impose absolute obligation on all. Few contemporary political theorists think that the particular steps by which Hobbes reasons his way to this conclusion are both sound and valid. Working through these issues here, however, would carry us away from our topic into complex details of contractarian political philosophy. What is important in the present context is that these details, as they are in fact pursued in the contemporary debates, all involve sophisticated interpretation of the issues using the resources of modern game theory. Furthermore, Hobbes's most basic point, that the fundamental justification for the coercive authority and practices of governments is peoples' own need to protect themselves from what game theorists call ‘social dilemmas’, is accepted by many, if not most, political theorists. Notice that Hobbes has not argued that tyranny is a desirable thing in itself. The structure of his argument is that the logic of strategic interaction leaves only two general political outcomes possible: tyranny and anarchy. Rational agents then choose tyranny as the lesser of two evils.
The reasoning of Cortez, of Henry V and of Hobbes's political agents has a common logic, one derived from their situations. In each case, the aspect of the environment that is most important to the agents' achievement of their preferred outcomes is the set of expectations and possible reactions to their strategies by other agents. The distinction between acting parametrically on a passive world and acting non-parametrically on a world that tries to act in anticipation of these actions is fundamental. If you wish to kick a rock down a hill, you need only concern yourself with the rock's mass relative to the force of your blow, the extent to which it is bonded with its supporting surface, the slope of the ground on the other side of the rock, and the expected impact of the collision on your foot. The values of all of these variables are independent of your plans and intentions, since the rock has no interests of its own and takes no actions to attempt to assist or thwart you. By contrast, if you wish to kick a person down the hill, then unless that person is unconscious, bound or otherwise incapacitated, you will likely not succeed unless you can disguise your plans until it's too late for him to take either evasive or forestalling action. The logical issues associated with the second sort of situation are typically much more complicated, as a simple hypothetical example will illustrate.
Suppose first that you wish to cross a river that is spanned by three bridges. (Assume that swimming, wading or boating across are impossible.) The first bridge is known to be safe and free of obstacles; if you try to cross there, you will succeed. The second bridge lies beneath a cliff from which large rocks sometimes fall. The third is inhabited by deadly cobras. Now suppose you wish to rank-order the three bridges with respect to their preferability as crossing-points. Your task here is quite straightforward. The first bridge is obviously best, since it is safest. To rank-order the other two bridges, you require information about their relative levels of danger. If you can study the frequency of rock-falls and the movements of the cobras for awhile, you might be able to calculate that the probability of your being crushed by a rock at the second bridge is 10% and of being struck by a cobra at the third bridge is 20%. Your reasoning here is strictly parametric because neither the rocks nor the cobras are trying to influence your actions, by, for example, concealing their typical patterns of behaviour because they know you are studying them. It is quite obvious what you should do here: cross at the safe bridge. Now let us complicate the situation a bit. Suppose that the bridge with the rocks was immediately before you, while the safe bridge was a day's difficult hike upstream. Your decision-making situation here is slightly more complicated, but it is still strictly parametric. You would have to decide whether the cost of the long hike was worth exchanging for the penalty of a 10% chance of being hit by a rock. However, this is all you must decide, and your probability of a successful crossing is entirely up to you; the environment is not interested in your plans.
However, if we now complicate the situation in the direction of non-parametricity, it becomes much more puzzling. Suppose that you are a fugitive of some sort, and waiting on the other side of the river with a gun is your pursuer. She will catch and shoot you, let us suppose, only if she waits at the bridge you try to cross; otherwise, you will escape. As you reason through your choice of bridge, it occurs to you that she is over there trying to anticipate your reasoning. It will seem that, surely, choosing the safe bridge straight away would be a mistake, since that is just where she will expect you, and your chances of death rise to certainty. So perhaps you should risk the rocks, since these odds are much better. But wait … if you can reach this conclusion, your pursuer, who is just as rational and well-informed as you are, can anticipate that you will reach it, and will be waiting for you if you evade the rocks. So perhaps you must take your chances with the cobras; that is what she must least expect. But, then, no … if she expects that you will expect that she will least expect this, then she will most expect it. This dilemma, you realize with dread, is general: you must do what your pursuer least expects; but whatever you most expect her to least expect is automatically what she will most expect. You appear to be trapped in indecision. All that might console you a bit here is that, on the other side of the river, your pursuer is trapped in exactly the same quandary, unable to decide which bridge to wait at because as soon as she imagines committing to one, she will notice that if she can find a best reason to pick a bridge, you can anticipate that same reason and then avoid her.
We know from experience that, in situations such as this, people do not usually stand and dither in circles forever. As we'll see later, there is a rational solution—that is, a best rational action—available to both players. However, until the 1940s neither philosophers nor economists knew how to find it mathematically. As a result, economists were forced to treat non-parametric influences as if they were complications on parametric ones. This is likely to strike the reader as odd, since, as our example of the bridge-crossing problem was meant to show, non-parametric features are often fundamental features of decision-making problems. Part of the explanation for game theory's relatively late entry into the field lies in the problems with which economists had historically been concerned. Classical economists, such as Adam Smith and David Ricardo, were mainly interested in the question of how agents in very large markets—whole nations—could interact so as to bring about maximum monetary wealth for themselves. Smith's basic insight, that efficiency is best maximized by agents freely seeking mutually advantageous bargains, was mathematically verified in the twentieth century. However, the demonstration of this fact applies only in conditions of ‘perfect competition,’ that is, when firms face no costs of entry or exit into markets, when there are no economies of scale, and when no agents' actions have unintended side-effects on other agents' well-being. Economists always recognized that this set of assumptions is purely an idealization for purposes of analysis, not a possible state of affairs anyone could try (or should want to try) to attain. But until the mathematics of game theory matured near the end of the 1970s, economists had to hope that the more closely a market approximates perfect competition, the more efficient it will be. No such hope, however, can be mathematically or logically justified in general; indeed, as a strict generalization the assumption can be shown to be false.
This article is not about the foundations of economics, but it is important for understanding the origins and scope of game theory to know that perfectly competitive markets have built into them a feature that renders them susceptible to parametric analysis. Because agents face no entry costs to markets, they will open shop in any given market until competition drives all profits to zero. This implies that if costs and demand are fixed, then agents have no options about how much to produce if they are trying to maximize the differences between their costs and their revenues. These production levels can be determined separately for each agent, so none need pay attention to what the others are doing; each agent treats her counterparts as passive features of the environment. The other kind of situation to which classical economic analysis can be applied without recourse to game theory is that of monopoly. Here, quite obviously, non-parametric considerations drop out, since there is only one agent under study. However, both perfect and monopolistic competition are very special and unusual market arrangements. Prior to the advent of game theory, therefore, economists were severely limited in the class of circumstances to which they could neatly apply their models.
Philosophers share with economists a professional interest in the conditions and techniques for the maximization of human welfare. In addition, philosophers have a special concern with the logical justification of actions, and often actions must be justified by reference to their expected outcomes. Without game theory, both of these problems resist analysis wherever non-parametric aspects are relevant. We will demonstrate this shortly by reference to the most famous (though not the most typical) game, the so-called Prisoner's Dilemma, and to other, more typical, games. In doing this, we will need to introduce, define and illustrate the basic elements and techniques of game theory. To this job we therefore now turn.
2. Basic Elements and Assumptions of Game Theory
2.1 Utility
An agent is, by definition, an entity with preferences. Game theorists, like economists and philosophers studying rational decision-making, describe these by means of an abstract concept called utility. This refers to the amount of ‘welfare’ an agent derives from an object or an event. By ‘welfare’ we refer to some normative index of relative well-being, justified by reference to some background framework. For example, we might evaluate the relative welfare of countries (which we might model as agents for some purposes) by reference to their per capita incomes, and we might evaluate the relative welfare of an animal, in the context of predicting and explaining its behavioral dispositions, by reference to its expected fitness. In the case of people, it is most typical in economics and applications of game theory to evaluate their relative welfare by reference to their own implicit or explicit judgments of it. Thus a person who, say, adores the taste of pickles but dislikes onions would be said to associate higher utility with states of the world in which, all else being equal, she consumes more pickles and fewer onions than with states in which she consumes more onions and fewer pickles. Examples of this kind suggest that ‘utility’ denotes a measure of subjective psychological fulfillment, and this is indeed how the concept was generally (though not always) interpreted prior to the 1930s. During that decade, however, economists and philosophers under the influence of behaviourism objected to the theoretical use of such unobservable entities as ‘psychological fulfillment quotients.’ The economist Paul Samuelson (1938) therefore set out to define utility in such a way that it becomes a purely technical concept. That is, when we say that an agent acts so as to maximize her utility, we mean by ‘utility’ simply whatever it is that the agent's behavior suggests her to consistently desire. If this looks circular to you, it should: theorists who follow Samuelson intend the statement ‘agents act so as to maximize their utility’ as a tautology. Like other tautologies occurring in the foundations of scientific theories, it is useful not in itself, but because it helps to fix our contexts of inquiry.Though we might no longer be moved by scruples derived from psychological behaviorism, many theorists continue to follow Samuelson's way of understanding utility because they think it important that game theory apply to any kind of agent—a person, a bear, a bee, a firm or a country—and not just to agents with human minds. When such theorists say that agents act so as to maximize their utility, they want this to be part of the definition of what it is to be an agent, not an empirical claim about possible inner states and motivations. Samuelson's conception of utility, defined by way of Revealed Preference Theory (RPT) introduced in his classic paper (Samuelson (1938)) satisfies this demand.
Some other theorists understand the point of game theory differently. They view game theory as providing an explanatory account of strategic reasoning. For this idea to be applicable, we must suppose that agents at least sometimes do what they do in non-parametric settings because game-theoretic logic recommends certain actions as the rational ones. Still other theorists interpret game theory normatively, as advising agents on what to do in strategic contexts in order to maximize their utility. Fortunately for our purposes, all of these ways of thinking about the possible uses of game theory are compatible with the tautological interpretation of utility maximization. The philosophical differences are not idle from the perspective of the working game theorist, however. As we will see in a later section, those who hope to use game theory to explain strategic reasoning, as opposed to merely strategic behavior, face some special philosophical and practical problems.
Since game theory involves formal reasoning, we must have a device for thinking of utility maximization in mathematical terms. Such a device is called a utility function. The utility-map for an agent is called a ‘function’ because it maps ordered preferences onto the real numbers. Suppose that agent x prefers bundle a to bundle b and bundle b to bundle c. We then map these onto a list of numbers, where the function maps the highest-ranked bundle onto the largest number in the list, the second-highest-ranked bundle onto the next-largest number in the list, and so on, thus:
bundle aThe only property mapped by this function is order. The magnitudes of the numbers are irrelevant; that is, it must not be inferred that x gets 3 times as much utility from bundle a as she gets from bundle c. Thus we could represent exactly the same utility function as that above by3
bundle b
2
bundle c
1
bundle a7,326
bundle b
12.6
bundle c
−1,000,000
The numbers featuring in an ordinal utility function are thus not measuring any quantity of anything. A utility-function in which magnitudes do matter is called ‘cardinal’. Whenever someone refers to a utility function without specifying which kind is meant, you should assume that it's ordinal. These are the sorts we'll need for the first set of games we'll examine. Later, when we come to seeing how to solve games that involve randomization—our river-crossing game from Part 1 above, for example—we'll need to build cardinal utility functions. The technique for doing this was given by von Neumann & Morgenstern (1947), and was an essential aspect of their invention of game theory. For the moment, however, we will need only ordinal functions.
2.2 Games and Information
All situations in which at least one agent can only act to maximize his utility through anticipating (either consciously, or just implicitly in his behavior) the responses to his actions by one or more other agents is called a game. Agents involved in games are referred to as players. If all agents have optimal actions regardless of what the others do, as in purely parametric situations or conditions of monopoly or perfect competition (see Section 1 above) we can model this without appeal to game theory; otherwise, we need it.We assume that players are economically rational. That is, a player can (i) assess outcomes; (ii) calculate paths to outcomes; and (iii) choose actions that yield their most-preferred outcomes, given the actions of the other players. This rationality might in some cases be internally computed by the agent. In other cases, it might simply be embodied in behavioral dispositions built by natural, cultural or economic selection. In particular, in calling an action ‘chosen’ we imply no necessary deliberation, conscious or otherwise. We mean merely that the action was taken when an alternative action was available, in some sense of ‘available’ normally established by the context of the particular analysis.
Each player in a game faces a choice among two or more possible strategies. A strategy is a predetermined ‘programme of play’ that tells her what actions to take in response to every possible strategy other players might use. The significance of the italicized phrase here will become clear when we take up some sample games below.
A crucial aspect of the specification of a game involves the information that players have when they choose strategies. The simplest games (from the perspective of logical structure) are those in which agents have perfect information, meaning that at every point where each agent's strategy tells her to take an action, she knows everything that has happened in the game up to that point. A board-game of sequential moves in which in which both players watch all the action (and know the rules in common), such as chess, is an instance of such a game. By contrast, the example of the bridge-crossing game from Section 1 above illustrates a game of imperfect information, since the fugitive must choose a bridge to cross without knowing the bridge at which the pursuer has chosen to wait, and the pursuer similarly makes her decision in ignorance of the actions of her quarry. Since game theory is about rational action given the strategically significant actions of others, it should not surprise you to be told that what agents in games know, or fail to know, about each others' actions makes a considerable difference to the logic of our analyses, as we will see.
2.3 Trees and Matrices
The difference between games of perfect and of imperfect information is closely related to (though certainly not identical with!) a distinction between ways of representing games that is based on order of play. Let us begin by distinguishing between sequential-move and simultaneous-move games in terms of information. It is natural, as a first approximation, to think of sequential-move games as being ones in which players choose their strategies one after the other, and of simultaneous-move games as ones in which players choose their strategies at the same time. This isn't quite right, however, because what is of strategic importance is not the temporal order of events per se, but whether and when players know about other players' actions relative to having to choose their own. For example, if two competing businesses are both planning marketing campaigns, one might commit to its strategy months before the other does; but if neither knows what the other has committed to or will commit to when they make their decisions, this is a simultaneous-move game. Chess, by contrast, is normally played as a sequential-move game: you see what your opponent has done before choosing your own next action. (Chess can be turned into a simultaneous-move game if the players each call moves while isolated from the common board; but this is a very different game from conventional chess.)It was said above that the distinction between sequential-move and simultaneous-move games is not identical to the distinction between perfect-information and imperfect-information games. Explaining why this is so is a good way of establishing full understanding of both sets of concepts. As simultaneous-move games were characterized in the previous paragraph, it must be true that all simultaneous-move games are games of imperfect information. However, some games may contain mixes of sequential and simultaneous moves. For example, two firms might commit to their marketing strategies independently and in secrecy from one another, but thereafter engage in pricing competition in full view of one another. If the optimal marketing strategies were partially or wholly dependent on what was expected to happen in the subsequent pricing game, then the two stages would need to be analyzed as a single game, in which a stage of sequential play followed a stage of simultaneous play. Whole games that involve mixed stages of this sort are games of imperfect information, however temporally staged they might be. Games of perfect information (as the name implies) denote cases where no moves are simultaneous (and where no player ever forgets what has gone before).
It was said above that games of perfect information are the (logically) simplest sorts of games. This is so because in such games (as long as the games are finite, that is, terminate after a known number of actions) players and analysts can use a straightforward procedure for predicting outcomes. A rational player in such a game chooses her first action by considering each series of responses and counter-responses that will result from each action open to her. She then asks herself which of the available final outcomes brings her the highest utility, and chooses the action that starts the chain leading to this outcome. This process is called backward induction (because the reasoning works backwards from eventual outcomes to present decision problems).
We will have much more to say about backward induction and its properties in a later section (when we come to discuss equilibrium and equilibrium selection). For now, we have described it just in order to use it to introduce one of the two types of mathematical objects used to represent games: game-trees. A game-tree is an example of what mathematicians call a directed graph. That is, it is a set of connected nodes in which the overall graph has a direction. We can draw trees from the top of the page to the bottom, or from left to right. In the first case, nodes at the top of the page are interpreted as coming earlier in the sequence of actions. In the case of a tree drawn from left to right, leftward nodes are prior in the sequence to rightward ones. An unlabelled tree has a structure of the following sort:
The point of representing games using trees can best be grasped by visualizing the use of them in supporting backward-induction reasoning. Just imagine the player (or analyst) beginning at the end of the tree, where outcomes are displayed, and then working backwards from these, looking for sets of strategies that describe paths leading to them. Since a player's utility function indicates which outcomes she prefers to which, we also know which paths she will prefer. Of course, not all paths will be possible because the other player has a role in selecting paths too, and won't take actions that lead to less preferred outcomes for him. We will present some examples of this interactive path-selection, and detailed techniques for reasoning through them, after we have described a situation we can use a tree to depict.![]()
Figure 1
Trees are used to represent sequential games, because they show the order in which actions are taken by the players. However, games are sometimes represented on matrices rather than trees. This is the second type of mathematical object used to represent games. Matrices, unlike trees, simply show the outcomes, represented in terms of the players' utility functions, for every possible combination of strategies the players might use. For example, it makes sense to display the river-crossing game from Section 1 on a matrix, since in that game both the fugitive and the hunter have just one move each, and each chooses their move in ignorance of what the other has decided to do. Here, then, is part of the matrix:
![]()
Figure 2
The fugitive's three possible strategies—cross at the safe bridge, risk the rocks and risk the cobras—form the rows of the matrix. Similarly, the hunter's three possible strategies—waiting at the safe bridge, waiting at the rocky bridge and waiting at the cobra bridge—form the columns of the matrix. Each cell of the matrix shows—or, rather would show if our matrix was complete—an outcome defined in terms of the players' payoffs. A player's payoff is simply the number assigned by her ordinal utility function to the state of affairs corresponding to the outcome in question. For each outcome, Row's payoff is always listed first, followed by Column's. Thus, for example, the upper left-hand corner above shows that when the fugitive crosses at the safe bridge and the hunter is waiting there, the fugitive gets a payoff of 0 and the hunter gets a payoff of 1. We interpret these by reference to their utility functions, which in this game are very simple. If the fugitive gets safely across the river he receives a payoff of 1; if he doesn't he gets 0. If the fugitive doesn't make it, either because he's shot by the hunter or hit by a rock or struck by a cobra, then the hunter gets a payoff of 1 and the fugitive gets a payoff of 0.
We'll briefly explain the parts of the matrix that have been filled in, and then say why we can't yet complete the rest. Whenever the hunter waits at the bridge chosen by the fugitive, the fugitive is shot. These outcomes all deliver the payoff vector (0, 1). You can find them descending diagonally across the matrix above from the upper left-hand corner. Whenever the fugitive chooses the safe bridge but the hunter waits at another, the fugitive gets safely across, yielding the payoff vector (1, 0). These two outcomes are shown in the second two cells of the top row. All of the other cells are marked, for now, with question marks. Why? The problem here is that if the fugitive crosses at either the rocky bridge or the cobra bridge, he introduces parametric factors into the game. In these cases, he takes on some risk of getting killed, and so producing the payoff vector (0, 1), that is independent of anything the hunter does. We don't yet have enough concepts introduced to be able to show how to represent these outcomes in terms of utility functions—but by the time we're finished we will, and this will provide the key to solving our puzzle from Section 1.
Matrix games are referred to as ‘normal-form’ or ‘strategic-form’ games, and games as trees are referred to as ‘extensive-form’ games. The two sorts of games are not equivalent, because extensive-form games contain information—about sequences of play and players' levels of information about the game structure—that strategic-form games do not. In general, a strategic-form game could represent any one of several extensive-form games, so a strategic-form game is best thought of as being a set of extensive-form games. When order of play is irrelevant to a game's outcome, then you should study its strategic form, since it's the whole set you want to know about. Where order of play is relevant, the extensive form must be specified or your conclusions will be unreliable.
2.4 The Prisoner's Dilemma as an Example of Strategic-Form vs. Extensive-Form Representation
The distinctions described above are difficult to fully grasp if all one has to go on are abstract descriptions. They're best illustrated by means of an example. For this purpose, we'll use the most famous game: the Prisoner's Dilemma. It in fact gives the logic of the problem faced by Cortez's and Henry V's soldiers (see Section 1 above), and by Hobbes's agents before they empower the tyrant. However, for reasons which will become clear a bit later, you should not take the PD as a typical game; it isn't. We use it as an extended example here only because it's particularly helpful for illustrating the relationship between strategic-form and extensive-form games (and later, for illustrating the relationships between one-shot and repeated games; see Section 4 below).The name of the Prisoner's Dilemma game is derived from the following situation typically used to exemplify it. Suppose that the police have arrested two people whom they know have committed an armed robbery together. Unfortunately, they lack enough admissible evidence to get a jury to convict. They do, however, have enough evidence to send each prisoner away for two years for theft of the getaway car. The chief inspector now makes the following offer to each prisoner: If you will confess to the robbery, implicating your partner, and she does not also confess, then you'll go free and she'll get ten years. If you both confess, you'll each get 5 years. If neither of you confess, then you'll each get two years for the auto theft.
Our first step in modeling your situation as a game is to represent it in terms of utility functions. Both you and your partner's utility functions are identical:
Go freeThe numbers in the function above are now used to express your and your partner's payoffs in the various outcomes possible in your situation. We will refer to you as ‘Player I’ and to your partner as ‘Player II’. Now we can represent your entire situation on a matrix; this is the strategic form of your game:4
2 years
3
5 years
2
10 years
0
Each cell of the matrix gives the payoffs to both players for each combination of actions. Player I's payoff appears as the first number of each pair, Player II's as the second. So, if both of you confess you each get a payoff of 2 (5 years in prison each). This appears in the upper-left cell. If neither of you confess, you each get a payoff of 3 (2 years in prison each). This appears as the lower-right cell. If you confess and your partner doesn't you get a payoff of 4 (going free) and she gets a payoff of 0 (ten years in prison). This appears in the upper-right cell. The reverse situation, in which she confesses and you refuse, appears in the lower-left cell.![]()
Figure 3
You evaluate your two possible actions here by comparing your payoffs in each column, since this shows you which of your actions is preferable for each possible action by your partner. So, observe: If your partner confesses than you get a payoff of 2 by confessing and a payoff of 0 by refusing. If your partner refuses, you get a payoff of 4 by confessing and a payoff of 3 by refusing. Therefore, you're better off confessing regardless of what she does. Your partner, meanwhile, evaluates her actions by comparing her payoffs down each row, and she comes to exactly the same conclusion that you do. Wherever one action for a player is superior to her other actions for each possible action by the opponent, we say that the first action strictly dominates the second one. In the PD, then, confessing strictly dominates refusing for both players. Both players know this about each other, thus entirely eliminating any temptation to depart from the strictly dominated path. Thus both players will confess, and both will go to prison for 5 years.
The players, and analysts, can predict this outcome using a mechanical procedure, known as iterated elimination of strictly dominated strategies. You, as Player 1, can see by examining the matrix that your payoffs in each cell of the top row are higher than your payoffs in each corresponding cell of the bottom row. Therefore, it can never be rational for you to play your bottom-row strategy, viz., refusing to confess, regardless of what your opponent does. Since your bottom-row strategy will never be played, we can simply delete the bottom row from the matrix. Now it is obvious that Player II will not refuse to confess, since his payoff from confessing in the two cells that remain is higher than his payoff from refusing. So, once again, we can delete the one-cell column on the right from the game. We now have only one cell remaining, that corresponding to the outcome brought about by mutual confession. Since the reasoning that led us to delete all other possible outcomes depended at each step only on the premise that both players are economically rational — that is, prefer higher payoffs to lower ones — there is very strong grounds for viewing joint confession as the solution to the game, the outcome on which its play must converge. You should note that the order in which strictly dominated rows and columns are deleted doesn't matter. Had we begun by deleting the right-hand column and then deleted the bottom row, we would have arrived at the same solution.
It's been said a couple of times that the PD is not a typical game in many respects. One of these respects is that all its rows and columns are either strictly dominated or strictly dominant. In any strategic-form game where this is true, iterated elimination of strictly dominated strategies is guaranteed to yield a unique solution. Later, however, we will see that for many games this condition does not apply, and then our analytic task is less straightforward.
You will probably have noticed something disturbing about the outcome of the PD. Had you both refused to confess, you'd have arrived at the lower-right outcome in which you each go to prison for only 2 years, thereby both earning higher utility than you receive when you confess. This is the most important fact about the PD, and its significance for game theory is quite general. We'll therefore return to it below when we discuss equilibrium concepts in game theory. For now, however, let us stay with our use of this particular game to illustrate the difference between strategic and extensive forms.
When people introduce the PD into popular discussions, you will sometimes hear them say that the police inspector must lock his prisoners into separate rooms so that they can't communicate with one another. The reasoning behind this idea seems obvious: if you could communicate, you'd surely see that you're both better off refusing, and could make an agreement to do so, no? This, one presumes, would remove your conviction that you must confess because you'll otherwise be sold up the river by your partner. In fact, however, this intuition is misleading and its conclusion is false.
When we represent the PD as a strategic-form game, we implicitly assume that the prisoners can't attempt collusive agreement since they choose their actions simultaneously. In this case, agreement before the fact can't help. If you are convinced that your partner will stick to the bargain then you can seize the opportunity to go scot-free by confessing. Of course, you realize that the same temptation will occur to her; but in that case you again want to make sure you confess, as this is your only means of avoiding your worst outcome. Your agreement comes to naught because you have no way of enforcing it; it constitutes what game theorists call ‘cheap talk’.
But now suppose that you do not move simultaneously. That is, suppose that one of you can choose after observing the other's action. This is the sort of situation that people who think non-communication important must have in mind. Now you can see that your partner has remained steadfast when it comes to your choice, and you need not be concerned about being suckered. However, this doesn't change anything, a point that is best made by re-representing the game in extensive form. This gives us our opportunity to introduce game-trees and the method of analysis appropriate to them.
First, however, here are definitions of some concepts that will be helpful in analyzing game-trees:
Node: A point at which a player takes an action.These quick definitions may not mean very much to you until you follow them being put to use in our analyses of trees below. It will probably be best if you scroll back and forth between them and the examples as we work through them. By the time you understand each example, you'll find the concepts and their definitions quite natural and intuitive.Initial node: The point at which the first action in the game occurs.
Terminal node: Any node which, if reached, ends the game. Each terminal node corresponds to an outcome.
Subgame: Any set of nodes and branches descending uniquely from one node.
Payoff: an ordinal utility number assigned to a player at an outcome.
Outcome: an assignment of a set of payoffs, one to each player in the game.
Strategy: a program instructing a player which action to take at every node in the tree where she could possibly be called on to make a choice.
To make this exercise maximally instructive, let's suppose that you and your partner have studied the matrix above and, seeing that you're both better off in the outcome represented by the lower-right cell, have formed an agreement to cooperate. You are to commit to refusal first, at which point she will reciprocate. We will refer to a strategy of keeping the agreement as ‘cooperation’, and will denote it in the tree below with ‘C’. We will refer to a strategy of breaking the agreement as ‘defection’, and will denote it on the tree below with ‘D’. As before, you are I and your partner is II. Each node is numbered 1, 2, 3, … , from top to bottom, for ease of reference in discussion. Here, then, is the tree:
Look first at each of the terminal nodes (those along the bottom). These represent possible outcomes. Each is identified with a assignment of payoffs, just as in the strategic-form game, with I's payoff appearing first in each set and II's appearing second. Each of the structures descending from the nodes 1, 2 and 3 respectively is a sub-game. We begin our backward-induction analysis—using a technique called Zermelo's algorithm—with the sub-games that arise last in the sequence of play. If the subgame descending from node 3 is played, then Player II will face a choice between a payoff of 4 and a payoff of 3. (Consult the second number, representing her payoff, in each set at a terminal node descending from node 3.) II earns her higher payoff by playing D. We may therefore replace the entire subgame with an assignment of the payoff (0,4) directly to node 3, since this is the outcome that will be realized if the game reaches that node. Now consider the subgame descending from node 2. Here, II faces a choice between a payoff of 2 and one of 0. She obtains her higher one, 2, by playing D. We may therefore assign the payoff (2,2) directly to node 2. Now we move to the subgame descending from node 1. (This subgame is, of course, identical to the whole game; all games are subgames of themselves.) You (Player I) now face a choice between outcomes (2,2) and (0,4). Consulting the first numbers in each of these sets, you see that you get your higher payoff—2—by playing D. D is, of course, the option of confessing. So you confess, and then your partner also confesses, yielding the same outcome as in the strategic-form representation.![]()
Figure 4
What has happened here is that you realize that if you play C (refuse to confess) at node 1, then your partner will be able to maximize her utility by suckering you and playing D. (On the tree, this happens at node 3.) This leaves you with a payoff of 0 (ten years in prison), which you can avoid only by playing D to begin with. You therefore defect from the agreement.
We have thus seen that in the case of the Prisoner's Dilemma, the simultaneous and sequential versions yield the same outcome. This will often not be true, however. In particular, only finite extensive-form (sequential) games of perfect information can be solved using Zermelo's algorithm.
As noted earlier in this section, sometimes we must represent simultaneous moves within games that are otherwise sequential. (As we said above, in all such cases the game as a whole will be one of imperfect information, so we won't be able to solve it using Zermelo's algorithm.) We represent such games using the device of information sets. Consider the following tree:
The oval drawn around nodes b and c indicates that they lie within a common information set. This means that at these nodes players cannot infer back up the path from whence they came; II does not know, in choosing her strategy, whether she is at b or c. (For this reason, what properly bear numbers in extensive-form games are information sets, conceived as ‘action points’, rather than nodes themselves; this is why the nodes inside the oval are labelled with letters rather than numbers.) Put another way, II, when choosing, does not know what I has done at node a. But you will recall from earlier in this section that this is just what defines two moves as simultaneous. We can thus see that the method of representing games as trees is entirely general. If no node after the initial node is alone in an information set on its tree, so that the game has only one subgame (itself), then the whole game is one of simultaneous play. If at least one node shares its information set with another, while others are alone, the game involves both simultaneous and sequential play, and so is still a game of imperfect information. Only if all information sets are inhabited by just one node do we have a game of perfect information.![]()
Figure 5
2.5 Solution Concepts and Equilibria
In the Prisoner's Dilemma, the outcome we've represented as (2,2), indicating mutual defection, was said to be the ‘solution’ to the game. Following the general practice in economics, game theorists refer to the solutions of games as equilibria. Philosophically minded readers will want to pose a conceptual question right here: What is ‘equilibrated’ about some game outcomes such that we are motivated to call them ‘solutions’? When we say that a physical system is in equilibrium, we mean that it is in a stable state, one in which all the causal forces internal to the system balance each other out and so leave it ‘at rest’ until and unless it is perturbed by the intervention of some exogenous (that is, ‘external’) force. This what economists have traditionally meant in talking about ‘equilibria’; they read economic systems as being networks of causal relations, just like physical systems, and the equilibria of such systems are then their endogenously stable states. As we will see after discussing evolutionary game theory in a later section, it is possible to maintain this understanding of equilibria in the case of game theory. However, as we noted in Section 2.1, some people interpret game theory as being an explanatory theory of strategic reasoning. For them, a solution to a game must be an outcome that a rational agent would predict using the mechanisms of rational computation alone. Such theorists face some puzzles about solution concepts that aren't so important for the behaviorist. We will be visiting such puzzles and their possible solutions throughout the rest of this article.It's useful to start the discussion here from the case of the Prisoner's Dilemma because it's unusually simple from the perspective of these puzzles. What we referred to as its ‘solution’ is the unique Nash equilibrium of the game. (The ‘Nash’ here refers to John Nash, the Nobel Prize-winning mathematician who in Nash (1950) did most to extend and generalize von Neumann & Morgenstern's pioneering work.) Nash equilibrium (henceforth ‘NE’) applies (or fails to apply, as the case may be) to whole sets of strategies, one for each player in a game. A set of strategies is a NE just in case no player could improve her payoff, given the strategies of all other players in the game, by changing her strategy. Notice how closely this idea is related to the idea of strict dominance: no strategy could be a NE strategy if it is strictly dominated. Therefore, if iterative elimination of strictly dominated strategies takes us to a unique outcome, we know we have found the game's unique NE. Now, almost all theorists agree that avoidance of strictly dominated strategies is a minimum requirement of rationality. This implies that if a game has an outcome that is a unique NE, as in the case of joint confession in the PD, that must be its unique solution. This is one of the most important respects in which the PD is an ‘easy’ (and atypical) game.
We can specify one class of games in which NE is always not only necessary but sufficient as a solution concept. These are finite perfect-information games that are also zero-sum. A zero-sum game (in the case of a game involving just two players) is one in which one player can only be made better off by making the other player worse off. (Tic-tac-toe is a simple example of such a game: any move that brings me closer to winning brings you closer to losing, and vice-versa.) We can determine whether a game is zero-sum by examining players' utility functions: in zero-sum games these will be mirror-images of each other, with one player's highly ranked outcomes being low-ranked for the other and vice-versa. In such a game, if I am playing a strategy such that, given your strategy, I can't do any better, and if you are also playing such a strategy, then, since any change of strategy by me would have to make you worse off and vice-versa, it follows that our game can have no solution compatible with our mutual rationality other than its unique NE. We can put this another way: in a zero-sum game, my playing a strategy that maximizes my minimum payoff if you play the best you can, and your simultaneously doing the same thing, is just equivalent to our both playing our best strategies, so this pair of so-called ‘maximin’ procedures is guaranteed to find the unique solution to the game, which is its unique NE. (In tic-tac-toe, this is a draw. You can't do any better than drawing, and neither can I, if both of us are trying to win and trying not to lose.)
However, most games do not have this property. It won't be possible, in this one article, to enumerate all of the ways in which games can be problematic from the perspective of their possible solutions. (For one thing, it is highly unlikely that theorists have yet discovered all of the possible problems!) However, we can try to generalize the issues a bit.
First, there is the problem that in most non-zero-sum games, there is more than one NE, but not all NE look equally plausible as the solutions upon which strategically rational players would hit. Consider the strategic-form game below (taken from Kreps (1990), p. 403):
![]()
Figure 6
This game has two NE: s1-t1 and s2-t2. (Note that no rows or columns are strictly dominated here. But if Player I is playing s1 then Player II can do no better than t1, and vice-versa; and similarly for the s2-t2 pair.) If NE is our only solution concept, then we shall be forced to say that either of these outcomes is equally persuasive as a solution. However, if game theory is regarded as an explanatory and/or normative theory of strategic reasoning, this seems to be leaving something out: surely rational players with perfect information would converge on s1-t1? (Note that this is not like the situation in the PD, where the socially superior situation is unachievable because it is not a NE. In the case of the game above, both players have every reason to try to converge on the NE in which they are better off.)
This illustrates the fact that NE is a relatively (logically) weak solution concept, often failing to predict intuitively sensible solutions because, if applied alone, it refuses to allow players to use principles of equilibrium selection that, if not demanded by rationality, are at least not irrational. Consider another example from Kreps (1990), p. 397:
![]()
Figure 7
Here, no strategy strictly dominates another. However, Player I's top row, s1, weakly dominates s2, since I does at least as well using s1 as s2 for any reply by Player II, and on one reply by II (t2), I does better. So should not the players (and the analyst) delete the weakly dominated row s2? When they do so, column t1 is then strictly dominated, and the NE s1-t2 is selected as the unique solution. However, as Kreps goes on to show using this example, the idea that weakly dominated strategies should be deleted just like strict ones has odd consequences. Suppose we change the payoffs of the game just a bit, as follows:
![]()
Figure 8
s2 is still weakly dominated as before; but of our two NE, s2-t1 is now the most attractive for both players; so why should the analyst eliminate its possibility? (Note that this game, again, does not replicate the logic of the PD. There, it makes sense to eliminate the most attractive outcome, joint refusal to confess, because both players have incentives to unilaterally deviate from it, so it is not an NE. This is not true of s2-t1 in the present game. You should be starting to clearly see why we called the PD game ‘atypical’.) The argument for eliminating weakly dominated strategies is that Player 1 may be nervous, fearing that Player II is not completely sure to be rational (or that Player II fears that Player I isn't completely rational, or that Player II fears that Player I fears that Player II isn't completely rational, and so on ad infinitum) and so might play t2 with some positive probability. If the possibility of departures from rationality is taken seriously, then we have an argument for eliminating weakly dominated strategies: Player I thereby insures herself against her worst outcome, s2-t2. Of course, she pays a cost for this insurance, reducing her expected payoff from 10 to 5. On the other hand, we might imagine that the players could communicate before playing the game and agree play correlated strategies so as to coordinate on s2-t1, thereby removing some, most or all of the uncertainty that encourages elimination of the weakly dominated row s1, and eliminating s1-t2 as a viable NE instead!
Any proposed principle for solving games that may have the effect of eliminating one or more NE from consideration is referred to as a refinement of NE. In the case just discussed, elimination of weakly dominated strategies is one possible refinement, since it refines away the NE s2-t1, and correlation is another, since it refines away the other NE, s1-t2, instead. So which refinement is more appropriate as a solution concept? People who think of game theory as an explanatory and/or normative theory of strategic rationality have generated a substantial literature in which the merits and drawbacks of a large number of refinements are debated. In principle, there seems to be no limit on the number of refinements that could be considered, since there may also be no limits on the set of philosophical intuitions about what principles a rational agent might or might not see fit to follow or to fear or hope that other players are following.
Behaviorists take a dim view of much of this activity. They see the job of game theory as being to predict outcomes given some distribution of strategic dispositions, and some distribution of expectations about the strategic dispositions of others, that have been shaped by institutional processes and / or evolutionary selection. (See Section 7 for further discussion.) On this view, which NE are viable in a game is determined by the underlying dynamics that equipped players with dispositions prior to the commencement of a game. The strategic natures of players are thereby treated as a set of exogenous inputs to the game, just as utility functions are. Behaviorists are thus less inclined to seek general refinements of the equilibrium concept itself, at least insofar as these involve the modeling of more sophisticated expressions of rationality over and above merely consistent maximization of utility. Behaviorists are often inclined to doubt that the goal of seeking a general theory of rationality makes sense as a project. Institutions and evolutionary processes build many environments, and what counts as rational procedure in one environment may not be favoured in another. Economic rationality requires only that agents have consistent preferences, that is, that they not prefer a to b and b to c and c to a. A great many arrangements of strategic dispositions are compatible with this minimal requirement, and evolutionary or institutional processes might generate games in any of them. On this view, NE is a robust equilibrium concept because if players evolve their strategic dispositions in settings that are competitive, those who don't do what's optimal given the strategies of others in that specific environment will be outcompeted, and so selection will either eliminate them or encourage the learning of new dispositions. There is no more ‘refined’ concept of rationality of which this can be argued to be true in general; and so, according to behaviorists, refinements of NE based on refinements of rationality are likely to be of merely occasional interest.
This does not imply that behaviorists abjure all ways of restricting sets of NE to plausible subsets. In particular, they tend to be sympathetic to approaches that shift emphasis from rationality itself onto considerations of the informational dynamics of games. We should perhaps not be surprised that NE analysis alone often fails to tell us much of interest about strategic-form games (e.g., Figure 6 above), in which informational structure is suppressed. Equilibrium selection issues are often more fruitfully addressed in the context of extensive-form games.
2.6 Modular Rationality and Subgame Perfection
In order to deepen our understanding of extensive-form games, we need an example with more interesting structure than the PD offers.Consider the game described by this tree:
This game is not intended to fit any preconceived situation; it is simply a mathematical object in search of an application. (L and R here just denote ‘left’ and ‘right’ respectively.)![]()
Figure 9
Now consider the strategic form of this game:
![]()
Figure 10
(If you are confused by this, remember that a strategy must tell a player what to do at every information set where that player has an action. Since each player chooses between two actions at each of two information sets here, each player has four strategies in total. The first letter in each strategy designation tells each player what to do if he or she reaches their first information set, the second what to do if their second information set is reached. I.e., LR for Player II tells II to play L if information set 5 is reached and R if information set 6 is reached.) If you examine this matrix, you will discover that (LL, RL) is among the NE. This is a bit puzzling, since if Player I reaches her second information set (7) in the extensive-form game, I would hardly wish to play L there; she earns a higher payoff by playing R at node 7. Mere NE analysis doesn't notice this because NE is insensitive to what happens off the path of play. Player I, in choosing L at node 4, ensures that node 7 will not be reached; this is what is meant by saying that it is ‘off the path of play’. In analyzing extensive-form games, however, we should care what happens off the path of play, because consideration of this is crucial to what happens on the path. For example, it is the fact that Player I would play R if node 7 were reached that would cause Player II to play L if node 6 were reached, and this is why Player I won't choose R at node 4. We are throwing away information relevant to game solutions if we ignore off-path outcomes, as mere NE analysis does. Notice that this reason for doubting that NE is a wholly satisfactory equilibrium concept in itself has nothing to do with intuitions about rationality, as in the case of the refinement concepts discussed in Section 2.5.
Now apply Zermelo's algorithm to the extensive form of our current example. Begin, again, with the last subgame, that descending from node 7. This is Player I's move, and she would choose R because she prefers her payoff of 5 to the payoff of 4 she gets by playing L. Therefore, we assign the payoff (5, -1) to node 7. Thus at node 6 II faces a choice between (-1, 0) and (5, -1). He chooses L. At node 5 II chooses R. At node 4 I is thus choosing between (0, 5) and (-1, 0), and so plays L. Note that, as in the PD, an outcome appears at a terminal node—(4, 5) from node 7—that is Pareto superior to the NE. Again, however, the dynamics of the game prevent it from being reached.
The fact that Zermelo's algorithm picks out the strategy vector (LR, RL) as the unique solution to the game shows that it's yielding something other than just an NE. In fact, it is generating the game's subgame perfect equilibrium (SPE). It gives an outcome that yields a NE not just in the whole game but in every subgame as well. This is a persuasive solution concept because, again unlike the refinements of Section 2.5, it does not demand ‘more’ rationality of agents, but less. (It does, however, assume that players not only know everything strategically relevant to their situation but also use all of that information; we must be careful not to confuse rationality with computational power.) The agents, at every node, simply choose the path that brings them the highest payoff in the subgame emanating from that node; and, then, in solving the game, they foresee that they will all do that. Agents who proceed in this way are said to be modular rational, that is, short-run rational at each step. They do not imagine themselves, by some fancy processes of hyper-rationality, acting against their local preferences for the sake of some wider goal. Note that, as in the PD, this can lead to outcomes which might be regretted from the social point of view. In our current example, Player I would be better off, and Player II no worse off, at the left-hand node emanating from node 7 than at the SPE outcome. But Player I's very modular rationality, and Player II's awareness of this, blocks the socially efficient outcome. If our players wish to bring about the more equitable outcome (4,5) here, they must do so by redesigning their institutions so as to change the structures of the games they play. Merely wishing that they could be hyper-rational in some way does not seem altogether coherent as an approach.
2.7 On Interpreting Payoffs: Morality and Efficiency in Games
Many readers might suppose that the conclusion of the previous section has been asserted on the basis of no adequate defense. Surely, the players might be able to just see that outcome (4,5) is socially and morally superior; and since we know they can also see the path of actions that leads to it, who is the game theorist to announce that, within the game they're playing, it's unattainable? In fact, to suggest that hyper-rationality is a will o’ the wisp is philosophically tendentious, though it is indeed what behaviorists about game theory believe. The reader who seeks a thorough justification for this belief is referred to Binmore (1994, 1998). However, before we just leave matters at a stand-off (here), we must be careful not to confuse what is controversial with the consequences of a simple technical mistake. Consider the Prisoner's Dilemma again. We have seen that in the unique NE of the PD, both players get less utility than they could have through mutual cooperation. This may strike you (as it has struck many commentators) as perverse. Surely, you may think, it simply results from a combination of selfishness and paranoia on the part of the players. To begin with they have no regard for the social good, and then they shoot themselves in the feet by being too untrustworthy to respect agreements.This way of thinking leads to serious misunderstandings of game theory, and so must be dispelled. Let us first introduce some terminology for talking about outcomes. Welfare economists typically measure social good in terms of Pareto efficiency. A distribution of utility β is said to be Pareto dominant over another distribution δ just in case from state δ there is a possible redistribution of utility to β such that at least one player is better off in β than in δ and no player is worse off. Failure to move to a Pareto-dominant redistribution is inefficient because the existence of β as a logical possibility shows that in δ some utility is being wasted. Now, the outcome (3,3) that represents mutual cooperation in our model of the PD is clearly Pareto dominant over mutual defection; at (3,3) both players are better off than at (2,2). So it is true that PDs lead to inefficient outcomes. This was true of our example in Section 2.6 as well.
However, inefficiency should not be associated with immorality. A utility function for a player is supposed to represent everything that player cares about, which may be anything at all. As we have described the situation of our prisoners they do indeed care only about their own relative prison sentences, but there is nothing essential in this. What makes a game an instance of the PD is strictly and only its payoff structure. Thus we could have two Mother Theresa types here, both of whom care little for themselves and wish only to feed starving children. But suppose the original Mother Theresa wishes to feed the children of Calcutta while Mother Juanita wishes to feed the children of Bogota. And suppose that the international aid agency will maximize its donation if the two saints nominate the same city, will give the second-highest amount if they nominate each others' cities, and the lowest amount if they each nominate their own city. Our saints are in a PD here, though hardly selfish or unconcerned with the social good.
To return to our prisoners, suppose that, contrary to our assumptions, they do value each other's well-being as well as their own. In that case, this must be reflected in their utility functions, and hence in their payoffs. If their payoff structures are changed, they will no longer be in a PD. But all this shows is that not every possible situation is a PD; it does not show that the threat of inefficient outcomes is a special artifact of selfishness. It is the logic of the prisoners' situation, not their psychology, that traps them in the inefficient outcome, and if that really is their situation then they are stuck in it (barring further complications to be discussed below). Agents who wish to avoid inefficient outcomes are best advised to prevent certain games from arising; the defender of the possibility of hyper-rationality is really proposing that they try to dig themselves out of such games by turning themselves into different kinds of agents.
In general, then, a game is partly defined by the payoffs assigned to the players. If a proposed solution involves tacitly changing these payoffs, then this ‘solution’ is in fact a disguised way of changing the subject.
2.8 Trembling Hands
Our last point above opens the way to a philosophical puzzle, one of several that still preoccupy those concerned with the logical foundations of game theory. It can be raised with respect to any number of examples, but we will borrow an elegant one from C. Bicchieri (1993), who also provides the most extensive treatment of the problem found in the literature. Consider the following game:![]()
Figure 11
The NE outcome here is at the single leftmost node descending from node 8. To see this, backward induct again. At node 10, I would play L for a payoff of 3, giving II a payoff of 1. II can do better than this by playing L at node 9, giving I a payoff of 0. I can do better than this by playing L at node 8; so that is what I does, and the game terminates without II getting to move. But, now, notice the reasoning required to support this prediction. I plays L at node 8 because she knows that II is rational, and so would, at node 9, play L because II knows that I is rational and so would, at node 10, play L. But now we have the following paradox: I must suppose that II, at node 9, would predict I's rational play at node 10 despite having arrived at a node (9) that could only be reached if I is not rational! If I is not rational then II is not justified in predicting that I will not play R at node 10, in which case it is not clear that II shouldn't play R at 9; and if II plays R at 9, then I is guaranteed of a better payoff then she gets if she plays L at node 8. Both players must use backward induction to solve the game; backward induction requires that I know that II knows that I is rational; but II can solve the game only by using a backward induction argument that takes as a premise the irrationality of I. This is the paradox of backward induction.
A standard way around this paradox in the literature is to invoke the so-called ‘trembling hand’ due to Selten (1975). The idea here is that a decision and its consequent act may ‘come apart’ with some nonzero probability, however small. That is, a player might intend to take an action but then slip up in the execution and send the game down some other path instead. If there is even a remote possibility that a player may make a mistake—that her ‘hand may tremble’—then no contradiction is introduced by a player's using a backward induction argument that requires the hypothetical assumption that another player has taken a path that a rational player could not choose. In our example, II could reason about what to do at node 9 conditional on the assumption that I rationally chose L at node 8 but then slipped.
There is a substantial technical literature on this backward-induction paradox, of which Bicchieri (1993) is the most comprehensive source. (Bicchieri, it should be noted, does not endorse an appeal to trembling hands as the appropriate solution. Discussing her particular proposal here would, however, take us too far afield into technicalities. The interested reader should study her book.) The puzzle has been introduced here just in order to point out that refinements of the type discussed in Section 2.6 can be encouraged by more than mere intuitions about the concept of rationality. For if hands may tremble then merely economically rational players will be motivated to worry about the probabilities with which apparent departures from rational play will be observed. For example, if my opponent's hand may tremble, then this gives me good reason to avoid the weakly dominated strategy s2 in the third example from Section 2.5. After all, my opponent might promise to play t1 in that game, and I may believe his promise; but if his hand then trembles and a play of t2 results, I get my worst payoff. If I'm risk-averse, then in such situations it would seem that I should stick to weakly dominant strategies.
The paradox of backward induction, like the puzzles raised by equilibrium refinement, is mainly a problem for those who view game theory as contributing to a normative theory of rationality (specifically, as contributing to that larger theory the theory of strategic rationality). The behaviorist can give a different sort of account of apparently irrational play and the prudence it encourages. This involves appeal to the empirical fact that actual agents, including people, must learn the equilibrium strategies of games they play, at least whenever the games are at all complicated. Research shows that even a game as simple as the Prisoner's Dilemma requires learning by people (Ledyard 1995, Sally 1995, Camerer 2003, p. 265). What it means to say that people must learn equilibrium strategies is that we must be a bit more sophisticated than was indicated earlier in constructing utility functions from behavior in application of Revealed Preference Theory. Instead of constructing utility functions on the basis of single episodes, we must do so on the basis of observed runs of behavior once it has stabilized, signifying maturity of learning for the subjects in question and the game in question. Once again, the Prisoner's Dilemma makes a good example. People encounter few one-shot Prisoner's Dilemmas in everyday life, but they encounter many repeated PD's with non-strangers. As a result, when set into what is intended to be a one-shot PD in the experimental laboratory, people tend to initially play as if the game were a single round of a repeated PD. The repeated PD has many Nash equilibria that involve cooperation rather than defection. Thus experimental subjects tend to cooperate at first in these circumstances, but learn after some number of rounds to defect. The experimenter cannot infer that she has successfully induced a one-shot PD with her experimental setup until she sees this behavior stabilize. (As noted in Section 2.7 above, if it does not so stabilize, she must infer that she has failed to induce a one-shot PD and that her subjects are playing some other game.)
The paradox of backward induction now dissolves. Unless players have experienced play at equilibrium with one another in the past, even if they are all rational, and all believe this about one another, we should predict that they will attach some positive probability to the conjecture that interaction partners have not yet learned all equilibria. This then explains why rational agents, unless they enjoy risk, may play as if they believe in trembling hands.
Learning of equilibria by rational agents may take various forms for different agents and for games of differing levels of complexity and risk. Incorporating it into game-theoretic models of interactions thus introduces an extensive new set of technicalities. For the most fully developed general theory, the reader is referred to Fudenberg and Levine (1998).
3. Uncertainty, Risk and Sequential Equilibria
The games we've modeled to this point have all involved players choosing from amongst pure strategies, in which each seeks a single optimal course of action at each node that constitutes a best reply to the actions of others. Often, however, a player's utility is optimized through use of a mixed strategy, in which she flips a weighted coin amongst several possible actions. (We will see later that there is an alternative interpretation of mixing, not involving randomization at a particular information set; but we will start here from the coin-flipping interpretation and then build on it in Section 3.1.) Mixing is necessary whenever no pure strategy maximizes the player's utility against all opponent strategies. Our river-crossing game from Section 1 exemplifies this. As we saw, the puzzle in that game consists in the fact that if the fugitive's reasoning selects a particular bridge as optimal, his pursuer must be assumed to be able to duplicate that reasoning. Thus the fugitive can escape only if his pursuer cannot reliably predict which bridge he'll use. Symmetry of logical reasoning power on the part of the two players ensures that the fugitive can surprise the pursuer only if it is possible for him to surprise himself.Suppose that we ignore rocks and cobras for a moment, and imagine that the bridges are equally safe. Suppose also that the fugitive has no special knowledge about his pursuer that might lead him to venture a specially conjectured probability distribution over the pursuer's available strategies. In this case, the fugitive's best course is to roll a three-sided die, in which each side represents a different bridge (or, more conventionally, a six-sided die in which each bridge is represented by two sides). He must then pre-commit himself to using whichever bridge is selected by this randomizing device. This fixes the odds of his survival regardless of what the pursuer does; but since the pursuer has no reason to prefer any available pure or mixed strategy, and since in any case we are presuming her epistemic situation to be symmetrical to that of the fugitive, we may suppose that she will roll a three-sided die of her own. The fugitive now has a 2/3 probability of escaping and the pursuer a 1/3 probability of catching him. The fugitive cannot improve on these odds if the pursuer is rational, so the two randomizing strategies are in Nash equilibrium.
Now let us re-introduce the parametric factors, that is, the falling rocks at bridge #2 and the cobras at bridge #3. Again, suppose that the fugitive is sure to get safely across bridge #1, has a 90% chance of crossing bridge #2, and an 80% chance of crossing bridge #3. We can solve this new game if we make certain assumptions about the two players' utility functions. Suppose that Player 1, the fugitive, cares only about living or dying (preferring life to death) while the pursuer simply wishes to be able to report that the fugitive is dead, preferring this to having to report that he got away. (In other words, neither player cares about how the fugitive lives or dies.) In this case, the fugitive simply takes his original randomizing formula and weights it according to the different levels of parametric danger at the three bridges. Each bridge should be thought of as a lottery over the fugitive's possible outcomes, in which each lottery has a different expected payoff in terms of the items in his utility function.
Consider matters from the pursuer's point of view. She will be using her NE strategy when she chooses the mix of probabilities over the three bridges that makes the fugitive indifferent among his possible pure strategies. The bridge with rocks is 1.1 times more dangerous for him than the safe bridge. Therefore, he will be indifferent between the two when the pursuer is 1.1 times more likely to be waiting at the safe bridge than the rocky bridge. The cobra bridge is 1.2 times more dangerous for the fugitive than the safe bridge. Therefore, he will be indifferent between these two bridges when the pursuer's probability of waiting at the safe bridge is 1.2 times higher than the probability that she is at the cobra bridge. Suppose we use s1, s2 and s3 to represent the fugitive's parametric survival rates at each bridge. Then the pursuer minimizes the net survival rate across any pair of bridges by adjusting the probabilities p1 and p2 that she will wait at them so that
s1 (1 − p1) = s2 (1 − p2)
Since p1 + p2 = 1, we can rewrite this as
s1 × p2 = s2 × p1
so
p1/s1 = p2/s2.
Thus the pursuer finds her NE strategy by solving the following simultaneous equations:
1 (1 − p1) = 0.9 (1 − p2) = (1 − p3) p1 + p2 + p3 = 1.
Then
p1 = 49/121 p2 = 41/121 p3 = 31/121
Now let f1, f2, f3 represent the probabilities with which the fugitive chooses each respective bridge. Then the fugitive finds his NE strategy by solving
s1 × f1 = s2 × f2 = s3 × f3
so
1 × f1 = 0.9 × f2 = 0.8 × f3
simultaneously with
f1 + f2 + f3 = 1.
Then
f1 = 36/121 f2 = 40/121 f3 = 45/121
These two sets of NE probabilities tell each player how to weight his or her die before throwing it. Note the — perhaps surprising — result that the fugitive uses riskier bridges with higher probability. This is the only way of making the pursuer indifferent over which bridge she stakes out, which in turn is what maximizes the fugitive's probability of survival.
We were able to solve this game straightforwardly because we set the utility functions in such a way as to make it zero-sum, or strictly competitive. That is, every gain in expected utility by one player represents a precisely symmetrical loss by the other. However, this condition may often not hold. Suppose now that the utility functions are more complicated. The pursuer most prefers an outcome in which she shoots the fugitive and so claims credit for his apprehension to one in which he dies of rockfall or snakebite; and she prefers this second outcome to his escape. The fugitive prefers a quick death by gunshot to the pain of being crushed or the terror of an encounter with a cobra. Most of all, of course, he prefers to escape. We cannot solve this game, as before, simply on the basis of knowing the players' ordinal utility functions, since the intensities of their respective preferences will now be relevant to their strategies.
Prior to the work of von Neumann & Morgenstern (1947), situations of this sort were inherently baffling to analysts. This is because utility does not denote a hidden psychological variable such as pleasure. As we discussed in Section 2.1, utility is merely a measure of relative behavioural dispositions given certain consistency assumptions about relations between preferences and choices. It therefore makes no sense to imagine comparing our players' cardinal—that is, intensity-sensitive—preferences with one another's, since there is no independent, interpersonally constant yardstick we could use. How, then, can we model games in which cardinal information is relevant? After all, modeling games requires that all players' utilities be taken simultaneously into account, as we've seen.
A crucial aspect of von Neumann & Morgenstern's (1947) work was the solution to this problem. Here, we will provide a brief outline of their ingenious technique for building cardinal utility functions out of ordinal ones. It is emphasized that what follows is merely an outline, so as to make cardinal utility non-mysterious to you as a student who is interested in knowing about the philosophical foundations of game theory, and about the range of problems to which it can be applied. Providing a manual you could follow in building your own cardinal utility functions would require many pages. Fortunately, such manuals are available in many textbooks. In any case, if you are a philosophy student you may not wish to attempt this until you've taken a course in probability theory.
Suppose we have an agent whose ordinal utility function is known. Indeed, suppose that it's our river-crossing fugitive. Let's assign him the following ordinal utility function:
EscapeNow, we know that his preference for escape over any form of death is likely to be stronger than his preference for, say, shooting over snakebite. This should be reflected in his choice behaviour in the following way. In a situation such as the river-crossing game, he should be willing to run greater risks to increase the relative probability of escape over shooting than he is to increase the relative probability of shooting over snakebite. This bit of logic is the crucial insight behind von Neumann & Morgenstern's (1947) solution to the cardinalization problem.4
Death by shooting
3
Death by rockfall
2
Death by snakebite
1
Begin by asking our agent to pick, from the available set of outcomes, a best one and a worst one. ‘Best’ and ‘worst’ are defined in terms of rational choice: a rational agent always chooses so as to maximize the probability of the best outcome—call this W—and to minimize the probability of the worst outcome—call this L. Now consider prizes intermediate between W and L. We find, for a set of outcomes containing such prizes, a lottery over them such that our agent is indifferent between that lottery and a lottery including only W and L. In our example, this would be a lottery having shooting and rockfall as its possible outcomes. Call this lottery T . We define a utility function q = u(T) such that if q is the expected prize in T , the agent is indifferent between winning T and winning a lottery in which W occurs with probability u(T) and L occurs with probability 1 − u(T).
We now construct a compound lottery T* over the outcome set {W, L} such that the agent is indifferent between T and T*. A compound lottery is one in which the prize in the lottery is another lottery. This makes sense because, after all, it is still W and L that are at stake for our agent in both cases; so we can then analyze T* into a simple lottery over W and L. Call this lottery r. It follows from transitivity that T is equivalent to r. (Note that this presupposes that our agent does not gain utility from the complexity of her gambles.) The rational agent will now choose the action that maximizes the probability of winning W. The mapping from the set of outcomes to u(r) is a von Neumann-Morgenstern utility function (VNMuf).
What exactly have we done here? We've simply given our agent choices over lotteries, instead of over prizes directly, and observed how much extra risk he's willing to run to increase the chances of winning escape over snakebite relative to getting shot or clobbered with a rock. A VNMuf yields a cardinal, rather than an ordinal, measure of utility. Our choice of endpoint-values, W and L, is arbitrary, as before; but once these are fixed the values of the intermediate points are determined. Therefore, the VNMuf does measure the relative preference intensities of a single agent. However, since our assignment of utility values to W and L is arbitrary, we can't use VNMufs to compare the cardinal preferences of one agent with those of another. Furthermore, since we are using a risk-metric as our measuring instrument, the construction of the new utility function depends on assuming that our agent's attitude to risk itself stays constant from one comparison of lotteries to another. This seems reasonable for a single agent in a single game-situation. However, two agents in one game, or one agent under different sorts of circumstances, may display very different attitudes to risk. Perhaps in the river-crossing game the pursuer, whose life is not at stake, will enjoy gambling with her glory while our fugitive is cautious. In general, a risk-averse agent prefers a guaranteed prize to its equivalent expected value in a lottery. A risk-loving agent has the reverse preference. A risk-neutral agent is indifferent between these options. In analyzing the river-crossing game, however, we don't have to be able to compare the pursuer's cardinal utilities with the fugitive's. Both agents, after all, can find their NE strategies if they can estimate the probabilities each will assign to the actions of the other. This means that each must know both VNMufs; but neither need try to comparatively value the outcomes over which they're gambling.
We can now fill in the rest of the matrix for the bridge-crossing game that we started to draw in Section 2. If all that the fugitive cares about is life and death, but not the manner of death, and if all the hunter cares about is preventing the fugitive from escaping, then we can now interpret both utility functions cardinally. This permits us to assign expected utilities, expressed by multiplying the original payoffs by the relevant probabilities, as outcomes in the matrix. Suppose that the hunter waits at the cobra bridge with probability x and at the rocky bridge with probability y. Since her probabilities across the three bridges must sum to 1, this implies that she must wait at the safe bridge with probability 1 − (x + y). Then, continuing to assign the fugitive a payoff of 0 if he dies and 1 if he escapes, and the hunter the reverse payoffs, our complete matrix is as follows:
![]()
Figure 12
We can now read the following facts about the game directly from the matrix. No rows or columns strictly or weakly dominate any others. Therefore, the game's NE must be in mixed strategies.
3.1 Beliefs
How should we interpret the processes being modeled by computations of NE strategy mixes in games like the river-crossing one? One possible kind of interpretation is an evolutionary one. If the hunter and the fugitive have regularly played games that structurally resemble this river-crossing game, then selection pressures will have encouraged habits in them that lead them both to play its NE strategies and to sincerely rationalize doing so by means of some satisfying story or other. If neither party has ever been in a situation like this, and if their biological and/or cultural ancestors haven't either, and if neither is concerned with revealing information to opponents in expected future situations of this sort (because they don't expect them to arise again),and if both parties aren't trained game theorists, then their behavior should be predicted not by a game theorist but by friends of theirs who are familiar with their personal idiosyncrasies. Behaviorists are happy to recognize that game theory isn't useful for modelling every possible empirical circumstance that comes along.
However, the philosopher who wants game theory to serve as a descriptive and/or normative theory of strategic rationality cannot rest content with this answer. He must find a satisfying line of advice for the players even when their game is alone in the universe of strategic problems. No such advice can be given that is uncontroversially satisfactory—behaviorists, after all, are often behaviorists because they aren't satisfied by any available approach here—but there is a way of handling the matter that many game theorists have found worthy of detailed pursuit. This involves the computation of equilibria in beliefs.
In fact, the behaviorist needs the concept of equilibrium in beliefs too, but for different purposes. As we've seen, the concept of NE sometimes doesn't go deep enough as an analytical instrument to tell us all that we think might be important in a game. Thus even behaviorists who aren't impressed with the project of refinements might make use of the concept of subgame-perfect equilibrium (SPE), as discussed in Section 2.6, if they think they're dealing with agents who are very well informed (say, because they're in a familiar institutional setting). But now consider the three-player imperfect-information game below known as ‘Selten's horse’ (for its inventor, Nobel Prize winner Reinhard Selten, and because of the shape of its tree; taken from Kreps (1990), p. 426):
![]()
Figure 13
One of the NE of this game is Lr2l3. This is because if Player I plays L, then Player II playing r2 has no incentive to change strategies because her only node of action, 12, is off the path of play. But this NE seems to be purely technical; it makes little sense as a solution. This reveals itself in the fact that if the game beginning at node 14 could be treated as a subgame, Lr2l3 would not be an SPE. Whenever she does get a move, Player II should play l2. But if Player II is playing l2 then Player I should switch to R. In that case Player III should switch to r3, sending Player II back to r2. And here's a new, ‘sensible’, NE: Rr2r3. I and II in effect play ‘keepaway’ from III.
This NE is ‘sensible’ in just the same way that a SPE outcome in a perfect-information game is more sensible than other non-SPE NE. However, we can't select it by applying Zermelo's algorithm. Because nodes 13 and 14 fall inside a common information set, Selten's Horse has only one subgame (namely, the whole game). We need a ‘cousin’ concept to SPE that we can apply in cases of imperfect information, and we need a new solution procedure to replace Zermelo's algorithm for such games.
Notice what Player III in Selten's Horse is wondering about as he selects his strategy. "Given that I get a move," he asks himself, "was my action node reached from node 11 or from node 12?" What, in other words, are the conditional probabilities that III is at node 13 or 14 given that he has a move? Now, if conditional probabilities are what III wonders about, then what Players I and II must make conjectures about when they select their strategies are III's beliefs about these conditional probabilities. In that case, I must conjecture about II's beliefs about III's beliefs, and III's beliefs about II's beliefs and so on. The relevant beliefs here are not merely strategic, as before, since they are not just about what players will do given a set of payoffs and game structures, but about what they think makes sense given some understanding or other of conditional probability.
What beliefs about conditional probability is it reasonable for players to expect from each other? The normative theorist might insist on whatever the best mathematicians have discovered about the subject. Clearly, however, if this is applied then a theory of games that incorporated it would not be descriptively true of most people. The behaviorist will insist on imposing only behavioral habits that a process of natural selection might build into its products. Perhaps some actual or possible creatures might observe habits that respect Bayes's rule, which is the minimal true generalization about conditional probability that an agent could know if it knows any such generalizations at all. Adding more sophisticated knowledge about conditional probability amounts to refining the concept of equilibrium-in-belief, just as some game theorists like to refine NE. You can imagine what behaviorists think of that project!
Here, we will restrict our attention to the least refined equilibrium-in-belief concept, that obtained when we require players to reason in accordance with Bayes's rule. Bayes's rule tells us how to compute the probability of an event F given information E (written ‘pr(F/E)’):
pr(F/E) = [pr(E/F) × pr(F)] / pr(E)
We will henceforth assume that players do not hold beliefs inconsistent with this equality.
We may now define a sequential equilibrium. A SE has two parts: (1) a strategy profile § for each player, as before, and (2) a system of beliefs μ for each player. μ assigns to each information set h a probability distribution over the nodes x in h, with the interpretation that these are the beliefs of player i(h) about where in his information set he is, given that information set h has been reached. Then a sequential equilibrium is a profile of strategies § and a system of beliefs μ consistent with Bayes's rule such that starting from every information set h in the tree player i(h) plays optimally from then on, given that what he believes to have transpired previously is given by μ(h) and what will transpire at subsequent moves is given by §.
We now demonstrate the concept by application to Selten's Horse. Consider again the uninteresting NE Lr2l3. Suppose that Player III assigns pr(1) to her belief that if she gets a move she is at node 13. Then Player II, given a consistent μ(II), must believe that III will play l3, in which case her only SE strategy is l2. So although Lr2l3 is a NE, it is not a SE. This is of course what we want.
The use of the consistency requirement in this example is somewhat trivial, so consider now a second case (also taken from Kreps (1990), p. 429):
![]()
Figure 14
Suppose that I plays L, II plays l2 and III plays l3. Suppose also that μ(II) assigns pr(.3) to node 16. In that case, l2 is not a SE strategy for II, since l2 returns an expected payoff of .3(4) + .7(2) = 2.6, while r2 brings an expected payoff of 3.1. Notice that if we fiddle the strategy profile for player III while leaving everything else fixed, l2 could become a SE strategy for II. If §(III) yielded a play of l3 with pr(.5) and r3 with pr(.5), then if II plays r2 his expected payoff would now be 2.2, so Ll2l3 would be a SE. Now imagine setting μ(III) back as it was, but change μ(II) so that II thinks the conditional probability of being at node 16 is greater than .5; in that case, l2 is again not a SE strategy.
The idea of SE is hopefully now clear. We can apply it to the river-crossing game in a way that avoids the necessity for the hunter to flip any coins of we modify the game a bit. Suppose now that II can change bridges twice during the fugitive's passage, and will catch him just in case she meets him as he leaves the bridge. Then the hunter's SE strategy is to divide her time at the three bridges in accordance with the proportion given by the equation in the third paragraph of Section 3 above.
It must be noted that since Bayes's rule cannot be applied to events with probability 0, its application to SE requires that players assign non-zero probabilities to all actions available in trees. This requirement is captured by supposing that all strategy profiles be strictly mixed, that is, that every action at every information set be taken with positive probability. You will see that this is just equivalent to supposing that all hands sometimes tremble. A SE is said to be trembling-hand perfect if all strategies played at equilibrium are best replies to strategies that are strictly mixed. You should also not be surprised to be told that no weakly dominated strategy can be trembling-hand perfect, since the possibility of trembling hands gives players the most persuasive reason for avoiding such strategies.
4. Repeated Games and Coordination
So far we've restricted our attention to one-shot games, that is, games in which players' strategic concerns extend no further than the terminal nodes of their single interaction. However, games are often played with future games in mind, and this can significantly alter their outcomes and equilibrium strategies. Our topic in this section is repeated games, that is, games in which sets of players expect to face each other in similar situations on multiple occasions. We approach these first through the limited context of repeated prisoner's dilemmas.We've seen that in the one-shot PD the only NE is mutual defection. This may no longer hold, however, if the players expect to meet each other again in future PDs. Imagine that four firms, all making widgets, agree to maintain high prices by jointly restricting supply. (That is, they form a cartel.) This will only work if each firm maintains its agreed production quota. Typically, each firm can maximize its profit by departing from its quota while the others observe theirs, since it then sells more units at the higher market price brought about by the almost-intact cartel. In the one-shot case, all firms would share this incentive to defect and the cartel would immediately collapse. However, the firms expect to face each other in competition for a long period. In this case, each firm knows that if it breaks the cartel agreement, the others can punish it by underpricing it for a period long enough to more than eliminate its short-term gain. Of course, the punishing firms will take short-term losses too during their period of underpricing. But these losses may be worth taking if they serve to reestablish the cartel and bring about maximum long-term prices.
One simple, and famous (but not, contrary to widespread myth, necessarily optimal) strategy for preserving cooperation in repeated PDs is called tit-for-tat. This strategy tells each player to behave as follows:
- Always cooperate in the first round.
- Thereafter, take whatever action your opponent took in the previous round.
There are two complications. First, the players must be uncertain as to when their interaction ends. Suppose the players know when the last round comes. In that round, it will be rational for players to defect, since no punishment will be possible. Now consider the second-last round. In this round, players also face no punishment for defection, since they know they will defect in the last round anyway. So they defect in the second-last round. But this means they face no threat of punishment in the third-last round, and defect there too. We can simply iterate this backwards through the game tree until we reach the first round. Since cooperation is not rational in that round, tit-for-tat is no longer a rational strategy, and we get the same outcome—mutual defection—as in the one-shot PD. Therefore, cooperation is only possible in repeated PDs where the expected number of repetitions is indeterminate. (Of course, this does apply to many real-life games.)
But now we introduce a second complication. Suppose that players' ability to distinguish defection from cooperation is imperfect. Consider our case of the widget cartel. Suppose the players observe a fall in the market price of widgets. Perhaps this is because a cartel member cheated. Or perhaps it has resulted from an exogenous drop in demand. If tit-for-tat players mistake the second case for the first, they will defect, thereby setting off a chain-reaction of mutual defections from which they can never recover, since every player will reply to the first encountered defection with defection, thereby begetting further defections, and so on.
If players know that such miscommunication is possible, they must resort to more sophisticated strategies. In particular, they must be prepared to sometimes risk following defections with cooperation in order to test their inferences. However, they mustn't be too forgiving, lest other players find it rationally optimal to exploit them through deliberate defections. In general, sophisticated strategies have a problem. Because they are more difficult for other players to infer, their use increases the probability of miscommunication. But miscommunication is what causes repeated-game cooperative equilibria to unravel in the first place! The moral of this is that PDs, even repeated ones, are very difficult to escape from. Rational players do best trying to avoid situations that are PDs, rather than relying on cunning stratagems for trying to get out of them.
Real, complex, social and political dramas are seldom straightforward instantiations of simple games such as PDs. Hardin (1995) offers an analysis of two recent, very real (and very tragic) political cases, the Yugoslavian civil war of 1991-95, and the 1994 Rwandan genocide, as PDs that were nested inside coordination games. A coordination game occurs whenever the utility of two or more players is maximized by their doing the same thing, and where such correspondence is more important to them than what, in particular, they both do. A standard example arises with rules of the road: ‘All drive on the left’ and ‘All drive on the right’ are both outcomes that are NEs, and neither is more efficient than the other. In games of ‘pure’ coordination, it doesn't even help to use more selective equilibrium criteria. For example, suppose that we require our players to reason in accordance with Bayes's rule (see Section 3 above). In these circumstances, any strategy that is a best reply to any vector of mixed strategies available in NE is said to be rationalizable. That is, a player can find a set of systems of beliefs for the other players such that any history of the game along an equilibrium path is consistent with that set of systems. Pure coordination games are characterized by non-unique vectors of rationalizable strategies. In such situations, players may try to predict equilibria by searching for focal points, that is, features of some strategies that they believe will be salient to other players, and that they believe other players will believe to be salient to them. (For example, if two people want to meet on a given day in a big city but can't contact each other to arrange a specific time and place, both might sensibly go to the city's most prominent downtown plaza at noon.) Unfortunately, in many of the social and political games played by people (and some other animals), the biologically shallow properties by which people sort themselves into racial and ethnic groups serve highly efficiently as such features. Hardin's analysis of recent genocides relies on this fact.
According to Hardin, neither the Yugoslavian nor the Rwandan disasters were PDs to begin with. That is, in neither situation, on either side, did most people begin by preferring the destruction of the other to mutual cooperation. However, the deadly logic of coordination, deliberately abetted by self-serving politicians, dynamically created PDs. Some individual Serbs (Hutus) were encouraged to perceive their individual interests as best served through identification with Serbian (Hutu) group-interests. That is, they found that some of their circumstances, such as those involving competition for jobs, had the form of coordination games. They thus acted so as to create situations in which this was true for other


3