MANSW logo
MANSW   The Mathematical Association of New South Wales, Inc.
Promoting Quality Mathematics Education for all.

Reflections on Sport and mathematics
The value of graphical presentation of data: an example from cricket

Iain Skinner, School of Electrical Engineering, The University of New South Wales

I recently found an elegant, instructive example of how some simple mathematics, combined with the power of a small computer, allows a very intelligible graphical presentation of information that is conventionally presented in long, barely comprehensible tables of numbers. Given that finding ways to interpret numerical information is an important function of mathematics, and given that the subject of the example was cricket, I thought there may be wider interest in this instructive example.

In cricket there are three numbers frequently used as summary statistics to indicate the quality of a bowler's performance. The first, and most frequently quoted, is the average (ave), which is defined as the total runs scored off, divided by the number of wickets taken by, the bowler. It is often termed the cost of a wicket, and, in general, the lower the average, the better the bowler. The second number is the bowler's strike rate (SR). It measures how often a bowler captures a wicket and is defined as the total number of deliveries bowled divided by the number of wickets taken. Again, the lower the better. The third number characterizing a bowler is the economy rate (ER). There are several definitions of this. (The definitive source of cricket records, the annual edition of Wisden Cricketers' Almanack, does not calculate one at all.) The one used here (consistent with the definition of Kimber (1993)) is the number of runs conceded per hundred balls bowled. Once again, the smaller the number, the better is the bowling.

By tradition (and cricket is a very traditional game) cumulative bowling figures are gathered for a specified period of time (season, career, etc.), and then ordered and tabulated, necessarily in a small font so that the most numbers can be arranged on the least number of pages. A glance at a typical set (e.g. Tables 1 and 2) of bowling figures shows that it is not immediately clear who has performed best. If the table were ordered by average, as is often the case, the better strike rates would remain obscure. Providing a second table, while helpful, does not solve the difficulty, for comparison across tables is not easy. An improvement in presenting a comparative summary of bowling performances was explained by Kimber (1993).

Table 1. The statistics for the bowlers noted in Figure 1 (those with at least 25 wickets in the
1996-97 Australian first-class cricket season. [Source: Cricinfo]

Key

Balls
Wkts
Ave
SR
ER
JA

J. Angel

1697
31
22.10
54.7
40.4
AB

A. Bichel

1474
30
21.67
49.1
44.1
IB

I. Bishop

1482
25
29.04
59.3
49.0
AD

A. Dale

2755
42
22.07
65.6
33.6
IH

I. Harvey

1904
35
27.57
54.4
50.7
BJ

B. Julian

2076
35
25.94
59.3
43.7
MK

M. Kasprowicz

2873
48
25.54
59.9
42.7
JM

J. Marquet

1763
25
41.12
70.5
58.3
GM

G. McGrath

1417
29
19.55
48.9
40.0
PM

P. McIntyre

2973
35
40.37
84.9
47.5
BM

B. McNamara

1490
33
22.45
45.2
49.7
CM

C. Miller

2607
32
35.72
81.5
43.8
TM

T. Moody

2148
38
24.37
56.5
43.1
MR

M. Ridgway

1832
28
34.93
65.4
53.4
DS

D. Saker

2573
32
37.81
80.4
47.0
SW

S. Warne

1892
27
29.44
70.1
42.0
SY

S. Young

2236
35
31.31
63.9
49.0

Table 2. The statistics for the bowlers noted in Figure 2 (those with at least 70 test wickets for
Australia since World War I (as at 1 May, 1998). [Source: Cricinfo]

Key

Balls
Wkts
Ave
SR
ER
TA

T Alderman

10 181
170
27.15
59.9
45.3
RB

R. Benaud

19 108
248
27.03
77.0
35.1
AC

A. Connolly

7 818
102
29.22
76.6
38.1
AD

A. Davidson

11 587
186
20.53
62.3
33.0
GD

G. Dymock

5 545
78
27.12
71.1
38.1
JG

J. Gleeson

8 853
93
36.20
95.2
38.0
JMG

J. Gregory

5 582
85
31.15
65.7
47.4
CG

C. Grimmett

14 513
216
24.21
67.2
36.0
NH

N. Hawke

6 974
91
29.41
76.6
38.4
RH

R. Hogg

7 633
123
28.47
62.1
45.9
MH

M. Hughes

12 285
212
28.38
57.9
49.0
HI

H. Ironmonger

4 695
74
17.97
63.4
28.3
IJ

I. Johnson

8 780
109
29.19
80.6
36.2
WJ

W. Johnston

11 048
160
23.91
69.1
34.6
GL

G. Lawson

11 118
180
30.56
61.8
49.5
DL

D. Lillee

18 467
355
23.92
52.0
46.0
RL

R. Lindwall

13 650
228
23.03
59.9
38.5
AAM

A. Mailey

6 119
99
33.91
61.8
54.9
AM

A. Mallett

9 990
132
29.84
75.7
39.4
TM

T. May

6 577
75
34.74
87.7
39.6
CM

C. McDermott

16 586
291
28.63
57.0
50.2
GM

G. McGrath

8 849
166
23.49
53.3
44.1
GMc

G. McKenzie

17 681
246
29.78
71.9
41.4
KM

K. Miller

10 461
170
22.97
61.5
37.3
WO

W. O'Reilly

10 024
144
22.59
69.6
32.5
BR

B. Reid

6 244
113
24.63
55.3
44.6
PR

P. Reiffel

6 403
104
26.96
61.6
43.8
RS

R. Simpson

6 881
71
42.26
96.9
43.6
JT

J. Thomson

10 535
200
28.00
52.7
53.2
MW

M. Walker

10 094
138
27.47
73.1
37.6
SW

S. Warne

19 791
313
24.77
63.2
39.2
SRW

S. Waugh

6 863
86
35.30
79.8
44.2
BY

B. Yardley

8 909
126
31.63
70.7
44.7

A little thought shows the relationship:

SR = 100 ave/ER,

so only two of the descriptive summary statistics are independent. One could argue that the two most fundamental are the strike and economy rates, which, respectively, directly measure the bowler's offensive (How often is a wicket taken?) and defensive (How hard is it to score runs?) capabilities. Consequently it is natural to compare bowlers' statistics on a two-dimensional Cartesian system, with ER and SR being the coordinates, so that, instead of appearing in a table of numbers, the details of a bowler's performance are depicted graphically in a bowling scatter diagram (see Figures 1 and 2). It follows from the above equation that any particular value of ave defines a hyperbola in the ER - SR plane (also shown in the figures).

In using the scatter diagram for comparisons, one finds that bowlers with better strike rates, that is, lower values of SR, appear closer to the bottom of the graph, and those with better economy rates, that is, lower values of ER, appear closer to the left-hand corner. The very best bowlers of all are those located 'closest' to the bottom left-hand corner. Notice that, in moving closer to this corner, the value of ave has also decreased, but that not everywhere corresponding to the same average is equally close to this corner. Furthermore, once bowling figures are plotted in this way it is possible to identify not only whose performance is best, but also informative patterns about the use of bowlers.

As an example, consider Figure 1, which shows the results of all bowlers who obtained 25 or more wickets in the 1996-97 Australian first-class cricket season. The key is in Table 1.

Figure 1. The bowling scatter diagram illustrating the performances of bowlers with at least 25 wickets in
first-class matches in the 1996-97 Australian first-class season.

The key is in Table 1. The broken curves are the hyperbolas defining a specific average, indexed by that average.

Figure 2. The bowling scatter diagram illustrating the performances of Australian bowlers who have captured at least 70 wickets in tests since World War I

The key is in Table 2. AC and NH are partially overlapping. The broken curves are the hyperbolas defining a specific average, indexed by that average.

In the example (Figure 1) we see that four bowlers (AD, JA, AB, BM) who would be closely grouped in a table ordered by ave (each having approximately 22) are widely separated on the bowling scatter diagram. We see immediately both that AD has the best economy rate, and that BM has the best strike rate. The two-dimensional representation allows ready assessment of two different measures of bowling skill. It is clear why AD, who had by far the lowest value of ER, was chosen for Australia's one-day cricket matches (at the end of that season, South Africa, April 1997), where restriction of the scoring rate is of greater concern than taking wickets. Whether GM has the best overall performance I leave as an open question.

As a second example, the performances of all Australian bowlers with at least 70 wickets in tests since World War I are shown in Figure 2. (Bowlers from the earlier period were excluded because of the difference in playing conditions.) The accompanying key is found in Table 2.

At least one pattern is immediate. The faster bowlers of the last twenty years (TA, RH, MH, GL, DL, CM, GM, BR, PR, JT) generally are grouped together, but away from the faster men of the previous era (AC, GMc, NH, MW). This agrees with what the game's critics would term a return to more attacking cricket, which is characterized by wickets and runs coming more quickly. This example highlights the value of the two-dimensional presentation of historians of the game, coaches and others who monitor tactics.

It is instructive to compare Figures 1 and 2 with Kimber's corresponding graphs for English cricket. In his graphs, bowlers of a particular type (fast, off-spinners, etc.) were shown to group together much more closely than in the Australian context. For example, in Figure 2, WJ (fast) is between WO and CG (slow), and SW (slow) is close to RL and KM (fast). Kimber did not find such mixing. Does this mean that there is a stereotyping in the way English cricket is played, and an absence of innovative captaincy? I leave such a question for critics better qualified than myself, but point out that, with this graphical method of presenting the data, such questions can be examined in a more quantitative manner.

Cricket is not a one-dimensional game. Bowling, in particular, is not a one-dimensional activity. It involves both dismissing batsmen/batswomen and restricting their scoring, so a bowler's average is only half the story. A two-dimensional representation is needed to understand better what is happening. I suggest that a graphical representation, as outlined above, should always accompany a table of bowling statistics. The alert reader will note that batting, too, has two aspects, and a similar graphical presentation could be adopted. However, the very lack of scoring-rate records indicates that, at least in first-class cricket, the batting average alone is accepted as giving a fair indication of the quality of performance over a long enough period of time.

This consideration of an aspect of a widely understand sport provided a simple and powerful example of how data that contain more than one key quantity of interest can be presented more meaningfully when organized in a higher-dimensional form than simple listing in a table, that is, one step towards a more mathematically demanding structure provides much more intelligible material.

References

Cricinfo - the Home of Cricket on the Internet. http://www.cricket.org.

Kimber, A. (1993). A graphical display for comparing bowlers in cricket. Teaching Statistics, 15 (3), pp. 84-86.

Visit the Primary PD and Secondary PD pages for the latest Inservice news