Artif Life Robotics (2002) 6:66~2
9 ISAROB 2002
Masao Kubo 9 Yoshiyuki Sasakabe
Criticality of cooperative society
Received and accepted: September 10, 2002 loop, ~'2 agents sometimes behave cooperatively, and sometimes compete with each other. The characteristics o[ the nonlinear dynamics, e.g., stability and equity, 2 have to be made clear in order to control and understand the system. One of the highlighted characters of the system is cooperation. 2 Sometimes, we know empirically that a selfish-agent system can develop into a cooperative society in which most of the agents score adequately. However, it might become an egoistic society where a few agents get a high reward, and the others obtain no reward. We attempt to understand what is necessary to h u m a n nature in these social phenomena. The structure of society, e.g., the distribution of agents and the hierarchy, will radically change the dynamics. 24 To Key words N-IPD - Distributed genetic algorithm- Iterated study the basic traits of a social system, we should not prisoner's dilemma 9 Replicator dynamics confuse the key structures. Here, we concentrate on localization, which is one of the keys because it is deeply related to the information flow of a society. This assumes a society which consists of spatially located agents who interact with 1 Introduction their neighbors, but the regions of interaction overlap with each other. Thereby, the effects of local interactions propaRecently, the design and problem-solving of open systems, gate to distant agents. We are interested in how the characfor example, the internet and other ubiquitous systems, teristics of the evolution of a cooperative society change have become very important and urgent issues. In these according to the size of the overlapping area of interaction. domains, there is no central authority, and no agent knows Iterated prisoner's dilemma (IPD) is used to research the aims of all other agents. Selfish agents, which try their economics, game theory, and artificial intelligence, etc. 3 best to maximize their own aims, interact with each other. 2-IPD (2-person IPD) is one of the simplest forms, and is The new social dynamics that are created by the collection often adopted to simulate 1-to-1 interactions. In this game, of these local interactions regulates the agent's behavior. ~ both agents can choose one of the actions, cooperation (C) Agents try to adopt and learn about the environment to or defection (D), and they obtain a payoff according to their become highly successful. By repetition of the micro-macro selection.3:4 The payoff matrix is shown in Table 1. This sequence is called the prisoner's dilemma (PD); repeating PD is IPD. The payoff matrix and the setting of the I P D has too many branches because the I P D has been modified to fit M. Kubo ( ~ ) 9Y. Sasakabe many topics and problems. 4 Also, genetic algorithms and Department of Computer Science, National Defense Academy, some other adaptation methods are used to research the Yokosuka 239-8686, Japan strategy of I P D when the agent has to have some learning Tel. +81-46-841-3810;Fax +81-46-844-5911 and adaptation abilities. e-mail:
[email protected] Here, we deal with the evolution of a society which conThis work was presented, in part, at the Sixth International Symposium sists of selfish agents, and there is related research with I P D on Artificial Life and Robotics, Tokyo, Japan, January 15-17, 2001 which also deals with this topic. Generally, IPD model that Abstract Generally speaking, there are two types of inter-
action, synchronous interactions (N members interact simultaneously) and asynchronous interactions (each member iriteracts with N neighboring agents individually). The purpose of this paper is to consider the dynamics of the evolution of a cooperative society with synchronous interaction by focusing on the spatial size of the interaction. Asynchronous interaction is dealt with in the cumulative payoff of individual 2-interated prisoner's dilemma (2-IPD) games. The result has shown that an adequate spatial size for interaction promotes a highly cooperative society, but it gets more difficult for agents with synchronous interactions to achieve a cooperative society as it increases in size.
67 can handle interactions among N agents is adequate to analyze a societyY -8 In particular, Seo et all 's concluded that it is becoming difficult for agents to achieve a cooperative society when the size of the interactions grows. They used the N - I P D model. N - I P D (N-person I P D ) assumes that N agents play IPD simultaneously. 9 The payoff matrix is shown in Table 2. The payoff is determined according to the number of agents who select cooperation (C). Therefore, it is getting difficult monotonically to achieve a highly cooperative state in which many agents choose C. However, we know that congressmembers often form a number of duos to talk of the enormous complexity of a round-robin conference. They engage in gathering small numbers of votes to win a future majority decision. This example suggests that this kind of interaction is also important, and it is proper that an agent interacts with its neighbors individually, and the sum of the p@off is regarded as a result of interaction among N persons. However, we do not know how to research the dynamics of the evolution of a cooperative society with this type of interaction. We call this interaction asynchronous interaction, while the ordinary NIPD interaction is called synchronous interaction. Here, we discuss the dynamics of the evolution of a cooperative society with synchronous interactions by focusing on the spatial size of interactions, When the number of interactions N is changed, how does this change the characteristics of the cooperative society? The experiments were conducted as follows. The agents are located on a 2-dimensional grid world, and they interact with their N neighborhood agents by asynchronous interactions. The asynchronous interactions are realized by the cumulative payoff of the individual 2-IPD games. The agent can refine its 2-1PD strategy by a genetic algorithm (GA), which uses local information because agents in such a society usually have learning and adaptation abilities. 9-1~ In the next section, the society is introduced. In Sect. 3, we show the result that an adequate spatial size of interaction promotes a highly by cooperative society, but it becomes difficult for agents with synchronous interaction to achieve a cooperative society with a growth in numbers.
structure are described. Finally, a definition of a player is presented, and the ability of adaptiveness is discussed. 2.1 Encoding strategy for the 2-IPD game The player of I P D is defined as Uno. 1~ They can select cooperate (C) or defect (D) per bout, and then repeat thc game again. The players choose their next move taking into account a series of previous games. Let us consider agi(O as the action of the i-th agent at time t.
agi(t) ~ {0(Cooperate), l(Defect )}
(1)
Then the strategies are represented as a rule set. For the 2I P D game remembering n previous steps, there are 22" combinations of possible histories. Therefore, at least 22n bits are needed to represent a strategy. In addition, agents need 2n m o r e bits for the early stage of iterations in which the size of a bis,tory is too short to use the rule. Thus, wc rcprcscnt the strategy as 22n + n bits. In this paper, agents interact with their neighborhoods, and each pair play the game 20 times. The size of a history is two (n = 2). Let us call stik strategies of the i-th agent, where k denotes the one of the possible histories. For example, if the history of a game is ((C, D),(C, C)), then agent i chooses " D " according to the position where k = 8 (0100 = 8).
2.2 Payoff matrix W e model the N-player game as m-times 2-IPD, so the profit of an agent is the sum of all 2-IPD games. The payoff matrix of 2-IPD is shown in Table 1. The prisoner's dilemma is subject to the following two conditions: (1) T > R > P > S; (2) 2R > T + S. In our experiments, (R,S,T,P) =
(3,0,5,1). Let us denote the profit of an agent i who plays games with agent j at time t as
profit(agi(t),agj(t))
(2)
The payoff of m iterations with the same agents is
2 The society
s profit(agi(k),ag,(k))
In this section, we first define the strategies of agents. Strategies are represented as if-then rules. The payoff matrix, the measurement of cooperation level, and neighborhood
k=t-m
(3)
and the total payoff which is gained by the game with the H neighborhood is
Table 1. Payoff matrix of the iterated prisoner's dilemma (IPD) game ( T > R > P > S , 2 R > T + S)
~, ~_.profit(agi(k ),agj(k ))
Own strategy
Next, we define the cooperation rate of the population. The cooperation rate is the ratio of cooperative actions to all actions. 7 Assume that agent i has played a game h times at time t; then the cooperation rate avci(t) is
$1 (cooperate) $2 (defect)
The other's strategy $1 (cooperate)
$2 (defect)
R R S T
T S P P
(4)
j k
avc~(t)
= (h-
Zlh=lagi(t- l)) h
(5)
68
W=20
I 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
0 T h i s f i g u r e s h o w s the case t h a t t h e a g e n t i n t e r a c t s to t h e
0
4 neighbors. 0 0
0
0
0
0
0
0
Fig. L Periodic boundary condition Fig. 2. Neighborhood structure Here, we set h = N H . m. The cooperation rate of the entire population is written as avca(t) - . 2 ' ~ a v c ' ( t ) N
(6)
V]{1. . . . . Nit}--~ Ng
2.3 Localization and player's definition We now define the neighborhood structure (Fig. 1). There are a number of studies that deal with neighborhood structure in IPD. 9x~In this article, the agents settle at each grid of a plane. The neighborhood structure is defined by considering symmetry, because the speed of spreading information should be uniform. Let us assume that the width of the plane is W and the position of agent i is (xi,y~). We then define projection 6 of agent i as
61W • W --> A g A g = {Agi},
IAgl = N
that changes its strategy, and neighborhood structure among the set. The neighborhood structure comprises the relative positions from one agent. Ng, is defined as
(7)
-- W 2
6)
Ug = {Ng~ }
ug .... Ug.sy
IUg4
V~z,~2 Ng,slx r NG,,2x
NH Ng~,ly --/:Ng,s2y
or and
Ng,~.y= 0} =
where ~ is a mapping from H to Ng 9 Ng, is an element of Ng, and comprises the set of relative distances (Ng .... Ng,sy) except Ng,~x = Ng,sy. Suppose that opponents of agent i are determined by
Agi = sti w m e m o r y / w profiti(t)
Ngi,. The opponents Agop are represented as
uavc~(t) w x~ u y~
Agop = 6(mod w (Ngisx + xi)), m o d w (Ngisy + Yi)
wNg~ w Goperation~ st i = {stik }
memo y = {momo y, memory q [memory i[ = n where A g is the set of players, and its size N is W 2. The player Ag~ consists of strategy sty, m e m o r y memory~ which stores the history of n plays, profit profits(t), average cooperation rate avc~(t), position x~, yg, function Goperationi
(9)
where mod w returns the remainder divided by W. Equation 9 is feas{ble at the edge of the plane; left-most agents interact with right-most agents, and the agents at the top interact with agents at the bottom, and vice versa. Figure 2 shows some neighborhood structures up to N 16. For larger sizes, neighborhood structures are given as
NH
= INgu l = N W 2 -
1} - (0,0)
Ng~x ~ { - N W . . . . . 0 . . . . . NW}
(10)
(11)
69
J~gusy E { - N W
..... 0..... NW}
(12)
In this article, H = {24,48,80} using N W = 2,3,4. 2.4 L e a r n i n g strategies of the players Learning ability is one of the most i m p o r t a n t issues for agents who wish to a d a p t to a dynamically changing society. W e n e e d to consider the speed of producing a new strategy, because agents change their strategies based on the behavior of other agents. Old strategies can spread through the p o p u l a t i o n if good new strategies are not produced. T h e n we can m a k e effective use of the genetic algorithm, which is used to p r o d u c e new strategies. W e use 1-point c r o s s o v e r and mutation. In this case, we defined m u t a t i o n rate (m.r.) as the speed of learning and adaptive speed. W e carried out an e x p e r i m e n t usifig 2-point crossover at first, but nothing to write h o m e about. This is not shown here. W e changed the m u t a t i o n rate in this experiment, and did get a r e m a r k a b l e result. The society could be changed on a large scale. W e therefore thought that it was i m p o r t a n t as a p a r a m eter of evolution. T h e m u t a t i o n rate was m e a n t to speed-up strategy changes, but the higher the speed, the m o r e we could not get a payoff. No players could get a good payoff. This is same as for m e m b e r s of a society, when the mutation rate corresponds to the speed at which it tries to adapt.
019 0.8 0.7 0.6
0.5 ~0.4 8 0.3 0.2 0.1 0 1
10001
Turn
20001
30001
H = 4, m . r . = 0 . 0 0 5
Fig. 3.
1 0.9 ..................................Illllllil.......................................................... ~ 0.8 I I '~ ' ] " I r/ / | q ''~ ' ' I I [l'"l ' "' I '"' { 0.7 ~ 0.6 .~ 0.5 ~ 0.4 0.3 0.2 0.1 0
1
10001
Turn
20001
30001
20001
30001
20001
30001
F i g . 4. H = 8, m . r . = 0.005
3 Simulation 3.1 Simulation results 0.8 W e set a grid width W (see Sect. 2.3 and Fig. 1) = 20 (the n u m b e r of players N = 400). T h e p a r a m e t e r H and the m u t a t i o n rate of all players are set to be equal. A f t e r the initial strategy of all the players had been r a n d o m l y defined, each run was executed as in the following sequence; we call the sequence a turn. First, every p l a y e r competes with its neighbor (the n u m b e r of players is H). Next, all players u p d a t e their strategy. This sequence was r e p e a t e d 30000 times. W e executed these calculations 100 times each for the statistical calculations. Forty-nine types of society ( H = {2,4,8,16,24,48,80} • mutation rate = {0.1,0.05,0.01,0.005, 0.001,0.0005,0.000]}) were examined. Figures 3-8 show the results. T h e s e figures illustrate one of 100 trials of a society with H = {4,8,16,24,48,80} and m u t a t i o n rate (m.r.) = 0.005. The x-axis indicates the cooperation rate, ava(t) and the y axis shows the turn. W h e n H = {4,8,16,24} and H = {48,80] are compared, there are some differences. In the societies H = {4,8,16,24}, the highest value of the average of the c o o p e r a t i o n rate avca(t) scores is a r o u n d 0.9. A s H gets larger, spike-like decreases in the average of the c o o p e r a t i o n rate occur frequently. These figures m e a n that these societies can quickly be r e c o v e r e d even if c o o p e r a t i o n is lost. O n the other hand, sometimes the societies with H = {48,80} could not recover their high degree of cooperation, for e x a m p l e at 20000 turns in Fig. 7. A s can be seen, during
~ 0.6 ~ 0.4 0.2 0
1
10001
Turn
F i g . 5. H = 16, m . r . = 0 . 0 0 5
0.91 0.8 ~ 0.7 i 0.6 ~ 0.5 ~ 0.4 0.3 0.2 0.1 0
1
10001
F i g . 6. H = 24, m . r . = 0.005
Turn
70 the term, the avca(t) is still almost 0. This means that their c o o p e r a t i o n is completely lost. T h e results are summarized below.
culate the time-average of the average c o o p e r a t i o n rate avca(t) and its variance in each trial. W e expect that the time-average of avca(t) will reveal the m e d i u m rate, and the variance of avca(t) will c o m p a r e the size of the spike like a b r e a k d o w n of avca(t). These two measures were calculated for each trial. T h e results of 100 trials were averaged because we must cancel the influence of the initial conditions. Figure 9 shows the results, and illustrates the behavior of H with same m u t a t i o n rate, and lines of m u t a t i o n rate = {0.0001,0.0005,0.001,0.005,0.01,0.05,0.1}. This is related in o r d e r from the smallest value of H. The x-axis is the average of avca(t), and the y-axis indicates the variance of avca(t). A s can be seen, except for a m u t a t i o n rate equal to 0.1, the societies first acquire very high stable cooperation relating to the increase in the n u m b e r of interactions H. H o w ever, if H is too large, this vanishes quickly with a large variance. W e call this case 1. Conversely, societies with a m u t a t i o n rate equal to 0.1 (see Fig. 9) b e h a v e in a complicated way. W h e n societies have a n a r r o w interaction region H = {2,4}, they cannot emerge as a cooperative society, but if this region is enlarged sufficiently, the c o o p e r a t i o n rate increases. W e call this case 2 because it is c o m p l e t e l y different from case 1. M o r e o v e r , the societies with H = {2}, except for those with a m u t a t i o n rate of {0.1,0.05}, are plotted far f r o m societies with H = {4,8}. F o r example, a society with H = 2 and m u t a t i o n rate = 0.01 is (x,y) = (0.52,0.08), and societies with H = {4,8,16} converge a r o u n d (x,y) = (0.82,0.04). W e believe that this is caused by the asymmetry of the interaction region of H = 2. This is a very interesting result, because it m a y suggest that the information flow a m o n g societies is important. H o w e v e r , we do not discuss this h e r e because of limited space. W e concentrate on the cases with symmetric interaction regions.
1. The larger the H, the less unstable the society becomes. 2. The m u t a t i o n rate drastically changes the dynamics of the evolutionary properties of these societies. 3. The quantity of variance of avca(t) differs each time because of its initial value. W e now require a new index which is able to c o m p a r e the dynamics of these societies quantitatively. First, we cal1 0.9 0.8 0.7
0.6
0.5 0.4 ~ 0.3 0.2 0.1 0 10001
Turn
20001
30001
20001
30001
Fig. 7. H = 48, m.r. = 0.005 1 0.9 0.8 0.7
0.6
0.5 0.4 0.3
0.2 0.1
0
3.2 S u m m a r y 1
10001
Turn
W e have considered two specific previous cases. First, we consider case 1. Figure 10 is a micrograph of case 1. In this
Fig. 8. H = 80, m.r. = 0.005
Fig. 9. The effects if neighborhood size m.r. = {0.0001,0.0005,0.001, 0.005,0.01,0.05,0.1}
0.5 H=80
H=80 ..... "" %.. ...
.=~o .......................k
0.4 o~ ~o
H~80
2
r" H=80
--.- m.r.=O.O001
/
-x- m.r.=O.O005 --*- m.r.=O.O01 m.r.=O.O05 - x - m.r.=O.01 --=- m.r.=O.05 m.r.=O.1
..........
.....................
~> 0.3 o~ H=80
............~.....~.....~.....'.....~.
H=80
........ ... "'
0.2
]\'?%,.,.
I
.r
~=2
0.1
0
/
/ -%
........
\
......::::~>
,.\
- -
0.2
t
t
0.3
0.4
i
i
0.5 0.6 0.7 average of cooperation ratio
0.8
0.9
1
71
Fig. 10. The effects if neighbor-
0.1
-~- m.r.=0.0001 --x-- m.r.=0.0005 m.r.=0.001
hood size m.r. = {0.0001,0.0005, 0.001,0.005,0.01}
m.r.=0.005
(D
- x - rn.r.=0.01
L
H=16
c-
,...\
~s\
0.05 O
\",
'
H=16
\\
...... H=16
g "r> ".. H=4
0
v\.,. H ~ ~ , . . . . / . .
i
0.8
0.85
I
/
H8
I
0.9 average of cooperation ratio
0.95
Table 2. Payoff matrix of the N-IPD game (D. > Cy for 0 --- x --< N,
0.8
D.~ + 1 > Dxand C~ + 1 > C, for 0 -
(D, + C, - 1)/2 for0
i
0.6
No. of cooperators 0
0.4
Cooperate Defect
0.2
1
x
N
CO
C1
DO
D~
Cx Dx
CN Du
0 1
10001
20001
30001
Turn
Fig. l l . H = 80, m.r. = 0.1
case, there are m a n y values to maintain high c o o p e r a t i o n and a stable society u n d e r H = {4,8,16,24}. W h e n interactions p r o c e e d in a larger region, the c o o p e r a t i o n rate does not rise, and the fluctuation gets bigger because of the rise in variance. Therefore, we conclude that when the m e m b e r s of a society interact asynchronously, the result is different from synchronously interacting societies. Moreover, the above results suggest that the interaction range should be decided to fit the m e m b e r s evolutionary/adaptive p r o p e r ties, or these p r o p e r t i e s should be set to be a d e q u a t e for their interaction region to p r o d u c e a highly cooperative society. W e think that this previously ignored characteristic is i m p o r t a n t for the design of multiagent systems for open environments. Also, the two terms in the evolution of the societies change before the turning point, and are subsequently by categorized by the variance of the average of the cooperation rate. A s a future exercise, it is suggested that it is possible to design systems whose agents autonomously control their evolutionary p r o p e r t i e s in o r d e r to p r o d u c e a highly cooperative society. O n the o t h e r hand, for case 2, Fig. 11 shows the result of one trial with H = 80 and mutation rate = 0.1. Here, we can see that avca(t) switches b e t w e e n 0.2 and 0.7 repeatedly. This society b e c o m e s highly cooperative in a short time, but
societies with n a r r o w interaction ranges cannot obtain such high c o o p e r a t i o n at any time. This suggests that agents with versatile evolutionary properties, e.g., a d a p t a t i o n to a dynamic environment, can c o o p e r a t e with each other by enlarging their interaction region. This is also a new observation. The societies in this article used only locally h o m o g e neous interactions, and so they are different from p e o p l e who interact in the real world. Also, the interaction types were selected with the synchronous types together. In the future, we h o p e to simulate this situation.
4 Conclusion
T o consider the dynamics of the evolution of a c o o p e r a t i v e society, we first categorized the interaction among m e m b e r s of the society into two types, synchronous and asynchron o u s , and each agent interacted with N neighboring agents individually. A s y n c h r o n o u s interaction was dealt with by the cumulative payoff of individual 2-IPD games. The result showed that an a d e q u a t e spatial s i z e for interactions p r o m o t e s a highly cooperative society, but it becomes difficult for agents with synchronous interaction to achieve a cooperative society when it begins to grow.
72
References 1. Shiozawa Y (2000) Significance of artificial markets for economics (in Japanese) Journal of Japanese Society for Artificial Intelligence, vol 5(6), p 951-957 2. Iwanaga S, Namatame A (2001) Asymmetric coordination of hetrogeneous agents. IEICE Trans. INF & SYST., vol E84-D, No. 8, p 937-944 3. Axelrod R (1984) The evolution of cooperation. Basic Books, New York 4. Axelrod R (1997) The complexity of cooperation. Princeton University Press, Princeton 5. Ikeda T, Iba T (2001) The strategy evolution in N-person iterated prisoner's dilemma games. IPSJ ICS 127:194-198 6. Ishida T, Yokoi H, Kakazu Y (1998) Competitive social system with self-organized norms of behavior. Intell Eng Syst Artif Neural Networks 8:573-578
7. Seo Y-G, Cho S-B, Yao X (1999) Emergence of cooperative coalition in N-IPD game with localization of interaction and learning. Proceedings of the 1999 Congress on Evolutionary Computation, vol 2, IEEE Press, Washington D.C., p 877-884 8. Seo Y, Cho S, Yao X (2000) The impact of payoff function and local interaction on the N-player iterated prisoner's dilemma game knowledge and information systems, vol 2, Issue 4, Springer, p 461478 9. Uno K, Namatame A (1999) Evolutionary behaviors emerged through strategic interactions in the large. Proceedings of Genetic and Evolutionary Computation Conference, Orland, p 14141421 10. Uno K, Namatame A (1999) A n evolutionary design of the networks of mutual reliability. Proceedings of Congress on Evolutionary Computation, Morgan Kaufmann p 1717-1723 11. Hansaryi J, Selten R (1988) A game theory of equibrium selection in games. MIT Press, Cambridge