=========================================================================
ANALYSIS OF WESTERN CANADA RETAILER LOTTERY WINS
[For the CBC National News, January 2009.]
by Jeffrey S. Rosenthal
(Dr. Rosenthal is a professor in the Department of Statistics at the
University of Toronto, and the author of "Struck by Lightning: The Curious
World of Probabilities". His web site is probability.ca.)
*** INTRODUCTION AND SUMMARY:
We consider major Western Canada major lottery wins, to see whether or not
lottery retailers have won more prizes than could reasonably have arisen
by pure chance alone. We consider three different groups of prizes:
(I) $10,000+ prizes during the period Nov 1 2003 - Oct 31 2006;
(II) $10,000+ prizes during the period June 1 2007 - Sept 30 2008;
(III) $1000+ prizes during the period June 1 2007 - Sept 30 2008.
Using the fairest available assumptions, we conclude:
- The 67 retailer wins in group (I) are significantly too many to have
arisen by pure chance alone, and this conclusion is quite strong and
quite robust to changes in assumptions.
- The 30 retailer wins in group (II) are again somewhat too high to have
arisen by pure chance alone; in this case the statistical evidence is
clear although not as overwhelming or robust as for group (I).
- The 265 retailer wins of group (III) are again too high to have
arisen by pure chance alone. The statistical evidence for this is again
clear, in fact slightly stronger than for group (II), but again not as
overwhelming or robust as for group (I).
- The conclusions for individual provinces and territories indicate that
for group (I), the number of retailer wins for AB and SK and MB are all
too high; for group (II), MB is too high; and for group (III), MB and NN
are too high. In fact, for group (II), once MB is eliminated, the number
of retailer wins in the other four regions combined is not excessive.
These conclusions provide convincing statistical evidence that the
retailers are winning more major lottery wins than can be reasonably
explained by pure chance alone.
*** KNOWN DATA (see Source list at the end):
Total regional adult population = 4,314,000. [S1, pp. 64, 74]
Total regional lottery-selling outlets = 4058. [S2]
Regional number of total major wins (subtracting off those of winners
residing outside the WCLC region), and retailer major wins:
(I) $10,000+, Nov 1 2003 - Oct 31 2006: 1610-24=1586, 67 [S1, p. 74]
(II) $10,000+, June 1 2007 - Sept 30 2008: 794, 30
= sum of: $10,000+, April 1 2008 - Sept 30 2008: 325, 11 [S2]
plus: $10,000+, June 1 2007 - March 31 2008: 469, 19 [S2]
(III) $1000+, June 1 2007 - Sept 30 2008: 9388, 265
= sum of: $1000+, April 1 2008 - Sept 30 2008: 3940-48=3892, 96 [S2]
plus: $1000+, June 1 2007 - March 31 2008: 5527-31=5496, 169 [S2]
*** AVERAGE NUMBER OF RETAILERS PER LOTTERY-SELLING OUTLET:
This figure is somewhat less certain. However, the WCLC in [S2]
determined the figure 9391 / 780 = 12.04, based on a detailed Saskatchewan
pilot project. This figure appears to be carefully determined. It is
somewhat larger than certain other estimates, which makes it more fair
to the retailers. So, we use the figure 12.04 in all our calculations.
(Note: here and throughout, "retailers" means ALL lottery-selling retail
store owners and employees.)
*** GAMING INTENSITY RATIO:
This is the ratio of the average lottery spending by retailers compared
to that of the general adult population (including non-participants).
This figure is also somewhat uncertain. A fifth estate survey (September
2006) estimated this ratio to be 1.5, and a detailed Corporate Research
Associates study (commissioned by the ALC in November 2006) found a
very similar figure of 1.52. Meanwhile, a Research Dimensions document
commissioned by the OLG in October 2006 reported a somewhat larger
ratio, 1.9, and that figure was then cited and used in [S1]. In our
calculations, we consider both the 1.52 figure (which appears to be the
most carefully determined), and also the 1.9 figure (since it is more
fair to the retailers, and originated with the OLG, and was used in [S1]).
*** STATISTICAL METHODOLOGY USED:
Our method proceeds by first computing:
Expected number of retailer wins
= (total number of wins)
* (total number of retail outlets)
* (average number of employees per outlet)
* (gaming intensity ratio)
/ (total adult population)
Then, the factor by which the actual number of retailer wins exceeds
the expected number is given by: factor = actual / expected.
More importantly, the probability of observing the actual number or more
of retailer wins, given the expected number, is found from the right-hand
tail of a Poisson distribution, computed using the "R" command:
ppois(actual-1, expected, lower.tail=FALSE)
If this probability is very small (i.e. less than 0.05, or even better
less than 0.01), then that provides convincing statistical evidence
against the hypothesis that the actual number of retailer wins occurred
by pure chance alone.
We also consider the "robustness" of our result, i.e. how much larger
the expected number could be, while still leaving the above probability
less than 0.01. The more robust the result, the more certain that the
conclusion cannot be "explained away" by incorrect assumptions or other
influences.
*** RESULTS (I) -- $10,000+ WINNERS, Nov 1 2003 - Oct 31 2006:
-- Using the gaming intensity ratio of 1.52:
Expected number = 1586 * 4058 * 12.04 * 1.52 / 4314000 = 27.30
Factor by which actual exceeds expected: 67 / 27.30 = 2.45
Probability of observing 67 or more:
ppois(66, 27.30, lower.tail=FALSE) = 1.06 x 10^(-10)
or about one chance in ten billion.
-- Using the gaming intensity ratio of 1.9:
Expected number = 1586 * 4058 * 12.04 * 1.9 / 4314000 = 34.13
Factor by which actual exceeds expected: 67 / 34.13 = 1.96
Probability of observing 67 or more:
ppois(66, 34.13, lower.tail=FALSE) = 4.29 x 10^(-7)
which less than one chance in 2.3 million.
Robustness: even with the higher gaming intensity ratio (1.9), and even
if the expected number of such retailer wins were actually 44% higher
than we computed, i.e. if it changed from 34.13 to 34.13 * 1.44 = 49.15,
the probability of observing 67 or more would still be less than 1%.
Conclusion: the 67 retailer wins of $10,000+ during this period are
significantly too many to have arisen by pure chance alone.
*** RESULTS (II) -- $10,000+ WINNERS, June 1 2007 - Sept 30 2008:
-- Using the gaming intensity ratio of 1.52:
Expected number = 794 * 4058 * 12.04 * 1.52 / 4314000 = 13.67
Factor by which actual exceeds expected: 30 / 13.67 = 2.19
Probability of observing 30 or more:
ppois(29, 13.67, lower.tail=FALSE) = 9.1 x 10^(-5)
which is about one chance in 11,000.
-- Using the gaming intensity ratio of 1.9:
Expected number = 794 * 4058 * 12.04 * 1.9 / 4314000 = 17.09
Factor by which actual exceeds expected: 30 / 17.09 = 1.76
Probability of observing 30 or more:
ppois(29, 17.09, lower.tail=FALSE) = 0.00294
which is about one chance in 340.
Robustness: even with the higher gaming intensity ratio (1.9), and even
if the expected number of such retailer wins were actually 9% higher
than we computed, i.e. if it changed from 17.09 to 17.09 * 1.09 = 18.62,
the probability of observing 30 or more would still be less than 1%.
Conclusion: the 30 retailer wins are again too high to have arisen by
pure chance alone, and the statistical evidence is clear although not
as overwhelming or robust as for the longer time period of group (I).
*** RESULTS (III) -- $1000+ WINNERS, June 1 2007 - Sept 30 2008:
-- Using the gaming intensity ratio of 1.52:
Expected number = 9388 * 4058 * 12.04 * 1.52 / 4314000 = 161.6
Factor by which actual exceeds expected: 265 / 161.6 = 1.64
Probability of observing 265 or more:
ppois(264, 161.6, lower.tail=FALSE) = 5.95 x 10^(-14)
or less than one chance in 16 trillion.
-- Using the gaming intensity ratio of 1.9:
Expected number = 9388 * 4058 * 12.04 * 1.9 / 4314000 = 202.0
Factor by which actual exceeds expected: 265 / 202.0 = 1.31
Probability of observing 265 or more:
ppois(264, 202.0, lower.tail=FALSE) = 1.294 x 10^(-5)
or less than one chance in 77,000.
Robustness: even with the higher gaming intensity ratio (1.9), and even
if the expected number of such retailer wins were actually 13% higher
than we computed, i.e. if it changed from 202.0 to 202.0 * 1.13 = 228.28,
the probability of observing 265 or more would still be less than 1%.
Conclusion: the 265 retailer wins are again too high to have arisen by
pure chance alone. The statistical evidence for this is again clear, in
fact slightly stronger than for the concurrent $10,000+ prizes, but again
not as overwhelming or robust as for the longer 2003-2006 time period.
*** PROVINCIAL/TERRITORIAL BREAKDOWN:
For completeness, we also consider each of the provinces and territories
separately. In each case, we list the following values in order:
"name": the two-letter name of the province or territory, except
"NN" means "Northwest Territories and Nunavut (combined)";
"pop": the total regional adult population (from [S1]);
"outlets": the number of regional retail outlets (from [S2]);
"total": the total number of lottery wins in the specified region
for the specified group (from [S1], [S2]);
"actual": the actual number of those wins by retailers (from [S1], [S2]);
"expected": the expected number of retailer wins, computed as above;
"factor": the factor by which actual exceeds expected, computed as above;
"probability": the probability of observing "actual" or more wins by
retailers by pure chance alone, computed as above.
PROV/TER RESULTS (I): $10,000+, Nov 1 2003 - Oct 31 2006: [data: S1, p. 74]
name; pop; outlets; total; actual; expected; factor; probability;
-- Using the gaming intensity ratio of 1.52:
AB; 2595000; 2392; 915; 33; 15.43530; 2.137957; 6.823999e-05;
SK; 749000; 780; 275; 15; 5.241017; 2.86204; 0.0003677701;
MB; 898000; 837; 358; 16; 6.106639; 2.620099; 0.0006120801;
NN; 48000; 20; 22; 0; 0.1677573; 0; 1;
YT; 24000; 29; 16; 3; 0.3538155; 8.478996; 0.0056751;
-- Using the gaming intensity ratio of 1.9:
AB; 2595000; 2392; 915; 33; 19.29412; 1.710366; 0.002803098;
SK; 749000; 780; 275; 15; 6.551271; 2.289632; 0.003174625;
MB; 898000; 837; 358; 16; 7.633298; 2.096079; 0.005413604;
NN; 48000; 20; 22; 0; 0.2096967; 0; 1;
YT; 24000; 29; 16; 3; 0.4422693; 6.783197; 0.01038687;
Conclusion: the number of retailer wins for AB and SK and MB are all
too high, and those of YT are borderline, while NN is fine.
PROV/TER RESULTS (II): $10,000+, June 1 2007 - Sept 30 2008: [data: S2]
name; pop; outlets; total; actual; expected; factor; probability;
-- Using the gaming intensity ratio of 1.52:
AB; 2595000; 2392; 478; 14; 8.063466; 1.736226; 0.03609787;
SK; 749000; 780; 118; 2; 2.248873; 0.8893345; 0.6571851;
MB; 898000; 837; 173; 13; 2.950973; 4.405326; 1.363198e-05;
NN; 48000; 20; 17; 1; 0.1296307; 7.714224; 0.1215802;
YT; 24000; 29; 8; 0; 0.1769077; 0; 1;
-- Using the gaming intensity ratio of 1.9:
AB; 2595000; 2392; 478; 14; 10.07933; 1.388981; 0.1413882;
SK; 749000; 780; 118; 2; 2.811091; 0.7114676; 0.7708035;
MB; 898000; 837; 173; 13; 3.688717; 3.524261; 0.0001266384;
NN; 48000; 20; 17; 1; 0.1620383; 6.171379; 0.1495914;
YT; 24000; 29; 8; 0; 0.2211347; 0; 1;
Conclusion: the number of retailer wins for MB are too high, while all
the others are fine when considered individually.
For this group, since MB is the most problematic, we also consider the
numbers when we eliminate MB and study the other four regions combined:
pop = 2595000 + 749000 + 48000 + 24000 = 3416000
outlets = 2392 + 780 + 20 + 29 = 3221
total = 478 + 118 + 17 + 8 = 621
actual = 14 + 2 + 1 + 0 = 17
expected = 621 * 3221 * 12.04 * 1.9 / 3416000 = 13.40
factor = 17 / 13.40 = 1.27
probability = ppois(16, 13.40, lower.tail=FALSE) = 0.1945538
or about 19.5%, not particularly small at all.
Conclusion: for group (II), once MB is eliminated, the number of retailer
wins for the other regions combined is *not* particularly excessive (at
least when using a gaming intensity ratio of 1.9). So, for this group,
the problem is largely confined to MB alone.
PROV/TER RESULTS (III): $1000+, June 1 2007 - Sept 30 2008: [data: S2]
name; pop; outlets; total; actual; expected; factor; probability;
-- Using the gaming intensity ratio of 1.52:
AB; 2595000; 2392; 5365; 133; 90.50313; 1.469562; 1.718556e-05;
SK; 749000; 780; 1578; 46; 30.07391; 1.529565; 0.004131552;
MB; 898000; 837; 2252; 74; 38.41383; 1.926390; 2.267545e-07;
NN; 48000; 20; 122; 11; 0.9302907; 11.82426; 4.835532e-09;
YT; 24000; 29; 71; 1; 1.570056; 0.6369199; 0.7919665;
-- Using the gaming intensity ratio of 1.9:
AB; 2595000; 2392; 5365; 133; 113.1289; 1.17565; 0.03689428;
SK; 749000; 780; 1578; 46; 37.59238; 1.223652; 0.1012544;
MB; 898000; 837; 2252; 74; 48.01728; 1.541112; 0.0003011532;
NN; 48000; 20; 122; 11; 1.162863; 9.45941; 4.555422e-08;
YT; 24000; 29; 71; 1; 1.96257; 0.5095359; 0.8595031;
Conclusion: the number of retailer wins for MB and NN appear to be
too high, and AB is borderline, while SK and YT are fine.
We again eliminate MB and study the other four regions combined:
pop = 2595000 + 749000 + 48000 + 24000 = 3416000
outlets = 2392 + 780 + 20 + 29 = 3221
total = 5365 + 1578 + 122 + 71 = 7136
actual = 133 + 46 + 11 + 1 = 191
expected = 7136 * 3221 * 12.04 * 1.9 / 3416000 = 153.92
factor = 191 / 153.92 = 1.24
probability = ppois(190, 153.92, lower.tail=FALSE) = 0.00215
or about one chance in 465, still very small.
Or eliminate MB and NN, and study the other three regions combined:
pop = 2595000 + 749000 + 24000 = 3368000
outlets = 2392 + 780 + 29 = 3201
total = 5365 + 1578 + 71 = 7014
actual = 133 + 46 + 1 = 180
expected = 7014 * 3201 * 12.04 * 1.9 / 3368000 = 152.50
factor = 180 / 152.50 = 1.18
probability = ppois(179, 152.50, lower.tail=FALSE) = 0.0162
or about 1.6%, small but not overwhelmingly small.
Conclusion: for group (III), even if MB is eliminated, the number of
retailer wins for the other four regions combined is still excessive.
However, if MB and NN are both eliminated, then the other three regions
combined are only borderline excessive. So, the problem for this group is
"mostly but not entirely" confined to MB and NN.
*** ADDITIONAL COMMENTS:
- There is huge variation in the number of retailers per outlet
depending on the TYPE of outlet. For example, an OLG document in 2006
assumed 4 retailers per Independent Convenience store, but 40 retailers
per Supermarket. (By contrast, [S2] assumes for simplicity an equal 12
employees per outlet of all different types, which is surely inaccurate.)
If WCLC itemised the retailer major lottery wins by type of retail outlet,
then our analysis could be made much more precise.
- The number of retail OWNERS is probably known precisely, or at least
can be estimated much more accurately than the total number of retailers.
So, if WCLC itemised the retailer lottery wins by owners versus employees,
then our analysis could again be made much more precise.
- It is conceivable that retailers have greater tendency to select those
lottery games with higher probability of major prizes, and that this
could partially "explain" their winning more major prizes than expected.
However, there is virtually no evidence to support this theory. In any
case, such effects would probably be rather small, and are probably more
than balanced out by our using generous assumptions (e.g. the figures
of 12.04 and 1.9) and considering the robustness.
- It seems quite plausible that retailers actually won even more major
prizes than reported, since lists of retailers were not kept, so retailer
winners were required to self-report and some may have gone undetected.
- Although the study [S1] is generally well-conducted and very useful, it
apparently mis-interpreted the Research Dimensions' gaming intensity ratio
of 1.9 as representing the ratio of lottery spending by retailers compared
to the SUBSET of the adult population that plays lotteries (which they
take to be 75% of the total adult population), rather than to the ENTIRE
adult population (which is what Research Dimensions actually reported).
So, [S1] effectively used a gaming intensity ratio of 1.9 / 0.75 = 2.53,
which was too large. They also used (for their "high end" comparison)
a previous WCLC estimate of 15 retail employees per outlet, but we now
know (from the later Saskatchewan study) that 12.04 is more accurate.
- The statistical results for individual provinces are somewhat less
reliable, since some assumptions (e.g. number of retailers per outlet)
may differ for certain regions (especially the territories), and
also the statistical significance is somewhat diminished when we go
"hunting" for conclusions in many different ways. Still, the results
provide an indication of where problems may lie. Also, while the smaller
population in certain regions makes the "factor" (actual / expected) more
variable, this is taken into account appropriately by the "probability"
calculations, so a very small probability is still significant.
*** SOURCES CITED:
[S1] Ernst & Young report for WCLC, October 2007.
[S2] WCLC document of 2008-12-08.
=========================================================================