## Thursday, March 24, 2011

### Mixed superrationality does not beat pure in prisoner's dilemma

The prisoner's dilemma is probably one of the most famous toy games of game theorists. It amounts to two criminals that being caught by the police are interrogated individually are offered the following deal: If both remain silent ("cooperate" with each other) both go to prison for $S$ ('short') years for small crimes that the police can prove. But if one prisoner admits the big crime ("defects") he goes free and the other spends $L$ ('long') years in prison. But if both admit the crime they both face a $M$ ('middle') year sentence. To be a dilemma the sentences should obey $0<S<M<L$ and by picking an appropriate normalisation of the unit of time, we can set $S=1$.

The standard (economist) analysis of the game goes as follows: I assume that the other prisoner has already made his decision. Then, no matter what he decided I am better off by defecting: If he cooperates, my choice is between going free and $S$ years while if he is defecting I can choose between $M$ and $L$. So I defect and he comes to the same conclusion, so we end up spending $M$ years in prison. Both defecting is in fact a Nash equilibrium.

That's not too exciting, as we could do better by both cooperating and serving only $S$ years, which is Pareto optimal but unstable because there is the temptation for each player to defect and then go free. So much for the classic analysis of this game (not iterated) which is a model for many decision problems where one has to decide between a personal advantage or the global optimum.

I first learned about this game many many years ago when still attending high school from a Douglas Hofstaedter column in the Scientific American. He makes the following observation: When defecting, I am counting on the fact that the other prisoner is not as clever as me. It only pays if the situation is asymmetric. But since the other prisoner is faced with the same problem, he will come up with the same solution so the asymmetric case of one player cooperating and the other defecting will not occur. Thus the only real possibilities are both cooperating (yielding $S$ years) and both defecting (yielding $M$ years) of which the obvious better choice is to cooperate. Hofstadter calls this argument "superrational". It is the realization that in the analysis of the Nash equilibrium the idea that my decision is independent of the other prisoner's decision might be wrong.

Then Hofstadter points out another version of this game: You receive a letter from a very rich person stating that she is studying human intelligence and she figured that you are one of the top ten intelligent people in the world. She offers you (and also the other nine top-brainers) the following game: On the bottom of the letter is a coupon. You can either ignore the letter (in which case nothing more will happen) or you write your name on the coupon and send it back. If out of the ten possible coupons she receives exactly one she gives the person who returned the coupon 100 Million dollars. If any other number of coupons arrive until the end of this year nobody will receive any money. And as a warning: You are watched over by a number of private investigators. If they notice you trying to find out who the other nine people are the whole thing is called off and again nobody will get any money. So don't even think about it.

This does not look very promising: Obviously, if you don't send in the coupon you won't get any money. So you have to send the coupon but so will the other nine and again you will receive nil. Too bad.

Well, unless you widen your strategy space and besides 'pure', deterministic strategies you also allow for 'mixed', i.e. probabilistic strategies. You could for example come up with the following strategy: You roll dice and then send the coupon only with probability $p$. Let's see which $p$ optimizes your expectation assuming the other nine player follow the same strategy: You only get the money if you send the letter (probability $p$) and all nine other don't (probablity $(1-p)^9$) so the expectation is $E=p(1-p)^9$. Setting to zero the $p$ derivative of $E$ gives $0=(1-p)^9-9p(1-p)^8=(1-p)^8(1-p-9p)$ thus $p=1/10$. So you could prepare ten envelopes but only one with the coupon and mail a random one of these to optimize your expectation.

But with this idea of taking into account also mixed strategies we can go back to the prisoner's dilemma and see what happens when both players defect with probability $p$ (this is the new part of the story I came up with this morning under the shower. Of course, I do not claim any originality here). Then the expected number of years I spend in prison is $p^2M+Lp(1-p)+(1-p)^2$. Quick check for $p$ being 0 or 1 I get back the two deterministic values. So can I do better? Obviously, this is a quadratic function of $p$ going through $(0,1)$ and $(1,M)$. So it has is minimum in the interior of the range $p\in[0,1]$ if the slope at $p=0$ is negative (remember $M>S=1$). But the slope is $2(M-L+1)p+L-2$ which is positive as long as $L>2$. But this is really the interesting parameter range for the game since for $L<2$ it is better for both players to always switch between cooperate-defect and defect-cooperate since the average sentence in the asymmetric case is shorter than the one year sentence of both cooperating. So, unless that is the case, always cooperating is still the better symmetric strategy of superrational players than the probabilistic ones.