Phil Birnbaum,
April 6, 2011
birnbaum@sympatico.ca
, www.philbirnbaum.com
A couple of weeks ago, I played in the "Pinburgh" pinball
tournament. It had a great format,
which I liked a lot, very different from most other events. Instead of having your scores ranked against
all other competitors, you play a series of matches, and you wind up with a W-L
record. That's a great way of doing it
... you get a much better idea of how you rank against everyone else. (There were only 90 games; as a baseball
fan, I kind of wish there had been 162, for a better intuitive idea of how good
you were. Every baseball fan knows what
99-63 means, or 61-101.)
Actually, it's not quite that pure, because the rules are stacked so you
wind up playing more games against people at your level. The tournament was designed by a man named
Bowen Kerins, who created the pinball ranking system (similar to the chess
system), and, as far as I can tell, is probably the most expert pinball
sabermetrician in existence.
Let me give you a condensed summary of how it works (the full rules are here).
The first day, you play five sessions.
Each session, you are matched against three opponents. Each of you plays a game of pinball. At the end of the game, your score is
compared to the other three, and you get a W or a L. So if you beat all your opponents, you go 3-0. If you're the second highest, you go
2-1. And so on. It's the same as a "3-2-1-0"
scoring system. (In normal match
pinball play, and the playoffs, they use 4-2-1-0. But then you can't have a W-L record.)
You repeat this for two other games.
At the end of the session, you've faced each opponent three times. So you wind up having gone somewhere between
9-0 and 0-9.
As I said, there are five sessions.
In subsequent sessions, you are matched with people closer and closer to
you in the standings. The second
session is against players from your half of the standings. The third is from players from your quarter
of the standings. By the fifth session,
you are playing three people directly adjacent to you in the standings.
After the five sessions the first day, you're somewhere between 0-45 and
45-0. As it happened, the
top player was 35-10, and the bottom player was 10-34. (Some players had fewer than 45 games
because the number of players wasn't a multiple of four, which meant some
sessions had 8 games instead of 9.)
At that point, there was a cut, like in golf. Only the top third of the 173 competitors continued on to be able
to play for the top prize. Those were
division "A". The middle
third was group B, and the bottom third was group C. (Some players were not allowed to be in B or C, because they were
historically too good; those stayed in A even if they wouldn't have qualified
otherwise.)
The second day proceeded like the first day, but players competed
against only those in their division.
In addition, players in group C were reset to 0-0 after Day 1.
That means that at the end of Day 2, players in A and B had 90 games
counted in the standings, and players in C had 45 games. The top 17 in each group went on to the
playoff rounds for their division.
The format, I think, worked very well.
It ensured that every one of the lesser B and C players had a legitimate
shot at winning their division in Day 2, and kept the scores fairly tight by
giving the worse players slightly lesser competition.
------
Anyway, I decided to try to simulate the tournament. Part of the reason was that I wanted to have
a better idea of how I did. I had gone
21-24 the first day, barely making B, and then 27-18 the second day, for a
final record of 48-42. That was good
enough to get me into the B playoffs, where I flamed out in the first round.
But, really, how good was that?
It's hard to say. It's above
.500, of course, but I played lesser opponents than the A people. And the guy who finished first in C ... was
he actually better than me, or worse?
He was probably better; since I barely made B, he was probably only a
game or two behind, but then he proved his talent by going 31-14 against
C-level opponents.
So, here's what I did. I created
180 random competitors, and gave each one a talent rating from a normal
distribution. Then I created a
simulation of pinball scoring (which I won't explain here) that takes the
talent into account. I created an
initial seeding that was imperfectly correlated to talent, and decided that the
top 15 seeds were the ones not allowed to drop to B or C.
I tweaked the simulation until certain results -- the distribution of
session scores, and the distribution of final W-L records -- were close to the
actual tournament outcomes. (Details in
the appendix.)
Then I started pulling random numbers and ran the tournament.
My rules were slightly different than the real one in a few ways. First, I had 180 competitors instead of
173. Second, I included players who
were restricted to A division, but I didn't include any players who were
restricted to *A or B* division (although there were some of those in real
life). Third, my groupings were
slightly different than the real ones, because I didn't feel like programming
groups of three instead of four.
Fourth, I assumed the pool of talent was normally distributed, which is
probably not true. Finally, I broke all
ties with a single tiebreaker game (the real tournament did that only for
important ties, and used seeding for the rest).
Still, I don't think these discrepancies should make a huge
difference.
------
So, here are some of the results, after running a few thousand simulated
tournaments.
First, as you might expect, there's a fair amount of luck in only 45
games, especially when those 45 games aren't independent (one bad game leads to
three losses, not just one). Suppose
you're the 10th best player in the tournament (in terms of actual talent -- in
my simulation, I get to be God and know everyone's actual talent, instead of
having to try to estimate it from previous results). In that case, you should expect to easily make A division,
right? For one thing, you might be one
of the 15 seeds who gets to stay in A no matter what. For another thing, there are 60 people who make A, and you're
better than at least 49 of them.
So what are your chances of making A?
Only about 83 percent. You have
a 15 percent chance of winding up in B, and a 2 percent chance of being
relegated to C.
That's a lot of choking. On
average, one of the top 27 players will wind up in C, just by having bad luck
the first day. (It's impossible to tell
whether that happened in real life, but there was one player who went 20-24,
and would have gone to C had he not been restricted to A.)
What's the opposite of choking ... clutching? There's even more of that.
Of the bottom 18 players, you'd expect one of them to wind up in A, just
by luck. The C-to-A migration is bigger
than the A-to-C migration, because some players aren't allowed to play in C.
Here's a breakdown of what your chances are of making the various
divisions, based on your relative talent.
Percentages may not add to 100% due to rounding:
Rank 001: 100% A
Rank 002: 99%
A, 1% B
Rank 005: 95%
A, 5% B, 1% C
Rank 010: 83%
A, 15% B, 2% C
Rank 020: 67%
A, 27% B, 6% C
Rank 060: 40%
A, 41% B, 19% C
Rank 090: 27%
A, 43% B, 30% C
Rank 120: 17%
A, 40% B, 43% C
Rank 171: 5%
A, 20% B, 75% C
Rank 180: 2%
A, 7% B, 90% C
------
Of course, even if a mediocre player gets lucky and makes A, he's
(she's) probably not going to be lucky enough to finish near the top of A to
make the playoffs. Everyone has at
least a 2% chance to make A, even the worst player (although that might not be
true in real life, if the players aren't really normally distributed; the worst
player might be someone who's never played before but was still willing to pay
the $100 entry fee). But not everyone
has a decent chance to make the top 17 in A, and thus the playoffs.
To have at least a 1 in 200 chance of making the A playoffs, you have to
be no worse than the 124th best player out of 180. I rounded to the nearest percent, so I can't tell you just how
slim the 180th player's chances are, but "extremely" is probably a
good approximation.
However, the bottom players' chances of making the C playoffs are still
pretty decent. Even the worst player
has a 2% chance, and that 124th player has a 13% chance. The 124th player also has a 1% chance of
making the A playoffs (actually, probably closer to half a percent -- I
rounded), and a 6% chance of making the B playoffs. Since everyone who makes the playoffs wins a prize (at least
their entry fee back), player 124 has an overall 1 in 5 chance of taking home
some money.
Here are some more playoff odds:
Rank 001: 86% overall -- 86% A
Rank 002: 78% overall -- 77% A, 1% B
Rank 005: 64% overall -- 59% A, 4% B,
1% C
Rank 010: 58% overall -- 45% A, 11% B, 2% C
Rank 020: 51% overall -- 28% A, 18% B, 5% C
Rank 060: 33% overall -- 6% A, 16% B, 11% C
Rank 090: 26% overall -- 2% A, 11% B, 13% C
Rank 120: 20% overall -- 1% A, 6% B, 13% C
Rank 171: 7%
overall -- 0% A, 1% B,
6% C
Rank 180: 2%
overall -- 2% C
The average chance of making the playoffs has to be 57/180, or about
32%. The tournament format seems pretty
good about giving everyone a decent shot: even if you're dead average, 90th
best, you still have a 26 percent shot at winning something.
------
Now, these findings are interesting in theory, but they don't help
individual cases much: because, after all, nobody really knows their actual
talent rank. Instead of converting
talent to performance, it would be nice to convert performance to talent. That way, we can estimate how good we really
are.
For instance: suppose you finished first in C division. How well does that compare to, say, the 10th
place finisher in B division? If those
two competitors were to play a match against each other, who should be favored
to win?
As it turns out, the average first place finisher in C was the 48th most
talented player overall (and presumably wound up in C by having awful luck in
the first 45 games). The average 10th
place finisher in B was the 66th best player overall. So, the C1 guy is probably a little better than the B10 guy.
What about the overall standings leader, the guy who finished number one
in A? His average talent, surprisingly:
17. That is: on average, he's only the
17th best player at the tournament.
That's not as low as it looks, actually. It's mostly a bunch of guys with single digit rankings, with an
occasional larger number who got really lucky.
Here are those two, along with a few other results:
A01: 17th
A02: 21st
A03: 23rd
A10: 34th
A20: 44th
A40: 63rd
B01: 38th
B02: 45th
B03: 59th
B10: 65th
B30: 89th
B60: 141st
C01: 48th
C02: 67th
C03: 77th
C30: 129th
C60: 173rd
Don't take these too seriously: you improve the estimates if you use
actual W-L records, rather than rankings.
Here are those numbers.
A division:
78+ wins: rank 1.5 out of 180
74 to 75 wins: rank 2
71 to 73 wins: rank 3
70-25: 4
65-25: 8
64-26: 9
63-27: 10
62-28: 11
61-29: 13
60-28: 15
59-29: 18
58-32: 20
57-33: 22
56-34: 25
55-35: 28
54-34: 32
50-40: 49
48-42: 59
45-45: 74
40-50: 96
35-45: 118
30-50: 146
B Division:
65-25: 14
64-26: 23
63-27: 23
62-26: 27
58-32: 36
55-35: 47
54-34: 50
50-40: 68
47-43: 83
45-45: 94
40-50: 119
35-45: 140
30-50: 155
C Division:
40- 5: 39
35-10: 60
34-11: 66
33-12: 71
30-15: 87
28-17: 98
25-20: 115
20-25: 139
15-30: 157
10-35: 167
The C rankings are low even for great records; for instance, going 40-5,
which is incredibly good, suggests you're still only 39th best out of 180. That's because if you're in C division, you
had a poor record on the first day, and the simulation effectively combines
that with the 40-5 when estimating how talented you really are.
-----
Okay, now let's look at who gets into the top four -- that is, the final
round of the playoffs. The way the
playoffs work is this: first place in the standings gets a bye. The other 16 break into four sessions of
four players each. They play three
games, each game scored 4-2-1-0. The
top 7 of 16 point-getters join the bye guy in the semi-finals. Those players break into two semi-final
sessions, and the top 4 of 8 make the finals.
Those four finalists play one session of three games, and are ranked by
points in that session.
Here are the chances of being one of the four players who get to the
finals, based on talent rank (which, again, is unknown in real life):
Player ranked 001: A 48%
Player ranked 002: A 35%
Player ranked 003: A 37%, B 1%
Player ranked 004: A 23%, B 1%
Player ranked 005: A 20%, B 2%
Player ranked 010: A 12%, B 5%, C 1%
Player ranked 020: A
6%, B 7%, C 2%
Player ranked 030: A
3%, B 5%, C 3%
Player ranked 040: A
2%, B 5%, C 3%
Player ranked 050: A
1%, B 4%, C 4%
The 50th ranked player has only a 15% chance of winding up in C division
at all. But if he does, he has a 25%
chance of making the finals (4% of all tournaments is 25% of 15% of the
tournaments).
Player ranked 060: A 1%, B 3%, C 4%
Player ranked 070: A 0%, B 3%, C 3%
By rank 066, the probability of making the A division finals rounds to zero, which means it's less than 0.5% (1 in 200). By rank 70, players have a better chance of making it to the C finals than the B finals.
Player ranked 080: A 0%, B 2%, C 4%
Player ranked 090: A 0%, B 2%, C 3%
Player ranked 100: A 0%, B 1%, C 3%
Player ranked 110: A 0%, B 1%, C 3%
Player ranked 120: A 0%, B 1%, C 2%
Player ranked 130: A 0%, B 0%, C 2%
The 130th ranked player is about the limit for having a non-zero chance
(after rounding) of making the B division finals.
Player ranked 140: C 2%
Player ranked 150: C 1%
Player ranked 160: C 1%
After the 168th player, the chance of making C finals drops below 1/200.
-------------
That's the finals. What about
the grand prize, finishing first overall?
As it turns out, and as the chart below will show, you have to be in the
top third of competitors to have an appreciable chance to win. 71 percent of the time, the ultimate winner
is one of the ten best players. 98
percent of the time, the winner is in the top 60.
However, a long shot does occasionally come through. In 8,062 random tournaments, the lowest
ranked winner was number 133 of 180.
That only happened once. 129th
also happened once, 121 happened once, and 125 happened two times.
Here are the full results, broken down into a denominator of 1,000
tournaments to make the numbers easier to understand.
Player ranked 001 won 199 times out of 1,000
Player ranked 002 won 119 times
Player ranked 003 won 92 times
Player ranked 004 won 70 times
Player ranked 005 won 58 times
Player ranked 006 won 42 times
Player ranked 007 won 38 times
Player ranked 008 won 37 times
Player ranked 009 won 29 times
Player ranked 010 won 25 times
Players ranked
1- 10 won 709 times (combined)
Players ranked
11- 20 won 157 times
Players ranked
21- 30 won 65 times
Players ranked
31- 40 won 32 times
Players ranked
41- 50 won 16 times
Players ranked
51- 60 won 8 times
Players ranked
61- 70 won 5 times
Players ranked
71- 80 won 4 times
Players ranked
81- 90 won 1 time
Players ranked
91-100 won 1 time
Players ranked 100-130 won 1 time
Players ranked 131-180 won 0 times
I hope those add up to about 1,000.
-------------
And finally: money winnings. As
it turns out, if you're in the middle of the pack, you should expect to get $50
of your $100 back in prizes. Here are
the prizes won by players of various rankings (out of 180, God's-eye view of
talent rank):
Player ranked 001 won $ 940
Player ranked 002 won $ 656
Player ranked 003 won $ 540
Player ranked 004 won $ 455
Player ranked 005 won $ 406
Player ranked 006 won $ 348
Player ranked 007 won $ 324
Player ranked 008 won $ 323
Player ranked 009 won $ 298
Player ranked 010 won $ 287
Player ranked 015 won $ 241
Player ranked 020 won $ 207
Player ranked 030 won $ 151
Player ranked 040 won $ 121
Player ranked 050 won $ 108
Player ranked 060 won $ 90
Player ranked 070 won $ 77
Player ranked 080 won $ 66
Player ranked 090 won $ 61
Player ranked 100 won $ 53
Player ranked 110 won $ 43
Player ranked 120 won $ 39
Player ranked 130 won $ 32
Player ranked 140 won $ 27
Player ranked 150 won $ 21
Player ranked 160 won $ 15
Player ranked 170 won $ 10
Player ranked 180 won $ 2
Roughly speaking, if you're in the top 1/3, expect to win your money
back or more. If you're in the middle
third, expect a little over a half your money back. And if you're in the bottom third, in the long you'll win back
1/4 of your registration fee.
---------
A lot of these results depend on your ranking within the pool of 180
entrants. As I said, that's pretty much
impossible to know for sure. You can
get a very rough estimate by combining some of these results, but there'll be a
fairly large confidence interval around it.
I'll use me as an example. I
finished in B, with a 48-42 record.
That was tied for 10th through 15th in the standings. In the playoffs, I was tied for 13/14 out of
17. Let's call it 14.
According to the simulation, 14th in B averaged 70th in overall
talent. But that's 14th in B out of 180
competitors. I finished 14th in B out of
only 173 competitors. That's easier, so
let's reduce my ranking from 70 to 73.
If you get another estimate by looking at W-L record, 48-42 in B was
worth 78th in overall talent.
So I'm probably somewhere between 73rd and 78th. Let's call it 75th.
Looking up players who were 75th, suggests that:
-- I have a 33% chance of making it to A next year; 45% of B; and 24% of
being relegated to C.
-- I have a 4% chance of making the A playoffs; 14% chance of making the
B playoffs; and 12% chance of making the C playoffs.
-- So, the consolation is: if I *am* relegated to C, I have a 50-50 shot
of getting to the playoffs (12% out of 24%).
-- I should expect to win $78.
-- Finally, my chances of winning the grand prize are only about .45 in
1,000, or 0.045%. That's because the
71-80 group won 4.5 in 1,000, and I'm 1/10 of that group. Effectively, I'm about a 2000:1 long shot.
However: I may not actually be 75th.
I could conceivably be much higher, and had bad luck at the tournament,
or I might be much lower, and had bad luck.
This is where you need more information.
But let's suppose I don't have any other information, because this was
my first tournament. What might have
happened is that I'm better than 70th, but had bad luck.
The standard deviation of wins in this tournament due to luck, in 90
games, is about 6. So, instead of
48-42, there's a 2.5 percent chance I'm actually 2 or more SD above that, which
would put me at 60-30. If that were the
case, I'd certainly have finished in A instead of B. The better competition would have brought me down somewhat: so
let's call it 55-35.
If I were 55-35 in A, then, suddenly, I'm 28th overall, instead of
75th. That means I have a 6 in 1000
chance, instead of a 0.5 in 1000 chance.
Therefore, the 95% confidence interval for estimating my skill from this
one tournament is very wide; it's centered on 75, but could be as high as 28
(and probably as low was 120ish).
The overall moral is that I'm still going to base my expectation on an
estimate that I'm 75th best out of 180.
But, that could be way off. For
best accuracy, I should play a whole bunch of different tournaments, so that I
can estimate my ranking more precisely.
However, as it turns out, I've played in two other tournaments, and
finished roughly the same in those as I did here. So I'll stick to the estimates above, for now.
Still, I'm hoping that I actually just had bad luck in all three
tournaments, and that I'm actually much better than the records show. That's really my only decent hope for
winning next year.
This is a description of how realistic simulation is, and how it gives
roughly the same results as the actual Pinburgh 2011 tournament.
In real life, there were 1,620 player-sessions. There were 10 sessions for 173 players, for
a total of 1,730, but 110 of them were 8 games instead of 9, so I ignored
those.
If all players were equal, how many 9-0 sessions would you expect? Well, for every four player session, there’s
a 1 in 16 chance that one of the four will go 9-0. Because, the first game, one of the players must go 3-0. The chance that player goes 3-0 twice more
is 1 in 4 squared, which is 1 in 16.
So, 1 in 16 for four player session is 1 in 64 sessions. 1,620 divided by 64 equals 25.3. So you’d expect 25.3 cases of a player going
9-0 – and, by the same logic, 25.3 cases of a player going 0-9.
What about 8-1? Well, there are
three ways a player can win eight games: 3/3/2, 3/2/3, and 2/3/3. So you’d expect three times as many 8-1 as
0-9. That works out to about 76 out of
1,620.
I could repeat the calculation for all scores, but it was easier just to
run a simulation. Out of 1,620 times,
you’d expect:
9-0: 25 times
8-1: 76 times
7-2: 153 times
6-3: 252 times
5-4: 305 times
4-5: 305 times
3-6: 252 times
2-7: 153 times
1-8: 76 times
0-9: 25 times
Now, that’s the theoretical statistical distribution when all the
players are equal. In real life, of
course, some are much better than others.
And so you’d expect more extreme results, like 9s 8s and 7s, and fewer 5-4s
and 4-5s.
That happened. Here are the real
results, then the simulated ones:
9-0: 25
real, 25 simulated
8-1: 87
real, 76 simulated
7-2: 170 real, 153 simulated
6-3: 239 real, 252 simulated
5-4: 277 real, 305 simulated
4-5: 294 real, 305 simulated
3-6: 261 real, 252 simulated
2-7: 152 real, 152 simulated
1-8: 88
real, 75 simulated
0-9: 26
real, 25 simulated
As expected, the real results are more extreme than the simulated
results, with a few exceptions that are probably because of luck.
BTW, you can find a full record of the real results here.
-----
Let me quickly figure out the standard deviation of talent, using a
method shown by Tom Tango a few years ago.
The standard deviation of the “real” is 2.000. The SD of the simulated was 1.936. By the equation
SD^2(talent)
= SD^2(actual) – SD^2(theoretical)
… we get that the SD of talent equals almost exactly 0.5. That means that if the average player would
go 4.5-4.5 in a typical session, the extremely talented players, 2 SDs from the
mean, would go 5.5-3.5, for a winning percentage of .611, or 55-45 over a
90-game tournament.
-----
OK, so, as expected, the real results are different from the simulation,
because the simulation had everyone with the same talent. Now, I tried that by making all the players
different in talent. I made the
distribution normal, trying different standard deviations and rerunning the
simulation, until I found one that seemed to fit the real life data the best.
But I’d never be able to get a perfect fit. Why? Because the real
results aren’t “right” – they’re not properly extreme everywhere. Specifically, the 9-0 and 0-9 numbers are
significantly smaller than they should be.
In fact, they’re almost exactly at the ‘every player is equal’ mark,
when they should be higher. That’s
probably just because of luck, considering the 8-1 and 1-8 numbers look OK.
So we’re not going to get a perfect fit. Here’s the fit I finally settled on:
9-0: 25
real, 32 simulated
8-1: 87
real, 84 simulated
7-2: 170 real, 155 simulated
6-3: 239 real, 245 simulated
5-4: 277 real, 289 simulated
4-5: 294 real, 290 simulated
3-6: 261 real, 249 simulated
2-7: 152 real, 158 simulated
1-8: 88
real, 84 simulated
0-9: 26
real, 29 simulated
It seems like a reasonable fit, especially considering that we have
strong reason to suspect the real-life data to be a bit off at the extremes.
-----
Now, let’s check to make sure the overall standings seemed to come out
OK. I looked at a bunch of results, to
compare real to simulated. (Full
results online here.)
Top record in A:
Real, 62-28. Simulated, 65-25.
Top record in B:
Real, 54-36. Simulated, 57-33.
Top record in C:
Real, 31-14. Simulated, 33-16.
Minimum playoff record in A: Real, 53-37. Simulated,
53-37.
Minimum playoff record in B: Real, 47-43. Simulated,
48-42.
Minimum playoff record in C: Real, 25-20. Simulated,
25-20.
Pretty good, except that the top records are always higher in the
simulation than in real life. I think
that’s because real life didn’t have enough extreme 9-0 and 0-9 records. I think if you added in a few more 9-0s and
0-9s, the real life winners would have had a couple more wins.
Or, it could just be that there are more good players than a normal
distribution would predict. I suspect
that might be the case, at least partially, considering that the best players
in the world are much more likely to come to Pittsburgh for this event.
However, despite the small discrepancies, I think the simulation does a
reasonable job of coming close to the real-life numbers, and is close enough
that we can trust most of the results.
------
One last thing to explain: how the games were simulated.
Here’s how it worked. For this
simulation, every player was given a talent level, which was the sum of (a) 50,
and (b) 10 independent uniform variables between 0 and 4. That means the overall talent level was
approximately normally distributed with mean of 70 and variance that … I didn’t
actually calculate.
Then, to simulated a pinball game I simulated repeated “shots” for each
player until he lost three balls. There
were three kinds of shots: good shots, OK shots, and lose-the-ball shots. Good shots scored between 9 and 10 points (9
plus two random uniform variables between 0 and 1). OK shots scored between 0 and 1 points (two random uniform
variables between 0 and 1). Lose-the-ball
shots scored 0, plus loss of ball.
For a player with talent X, the probabilities for each of the three
shots was:
Good shot: X%
OK shot: 90% of non-good shots (that is, 0.9 * (100 – X)%)
Lose-the-ball shot: 10% of non-good shots (that is, 0.1 * (100-X)%).
It’s a pretty crude simulation of a pinball game, but it seemed to work
OK. It actually doesn’t matter what the
internal details are of simulating a game, so long as players of a certain
skill beat opponents of a certain skill with the right probability. And the final results of the simulation
suggest they did. If the good players
beat the worse players too often, there would have been too many 9-0 and 8-1
sessions. If the good players beat the
worse players not often enough, there would have been too many 5-4 and 4-5
sessions.
And, of course, the scores don’t map to real scores. Real-life scores aren’t linear. In real life, the first 20 good shots might
net you one million points, while the second 20 good shots might net you ten
million points. However, since all that
matters is who beats whom, the actual scores don’t matter. In older-style tournaments (for instance), where your
standing was based on the sum of your scores, that wouldn’t be the case.