The recent meeting of the American Statistical Association was held in the Indianapolis Convention Center. Bill Finzer from Key Curriculum Press was at the meeting to demonstrate their new statistical package Fathom (See Chance News 8.08). Bill noticed that the men's bathrooms had one inch square blue and white tiles on the floor laid out in an apparently random pattern. The floor was about 24ft x 24ft so there were 288^2 = 82,944 such squares. For a picture of the floor go here.
Bill proposed testing to see if the pattern is indeed random. Looking at a few randomly chosen 10in x 10in squares suggested that it was at least reasonable to assume that about half the squares are white and half are blue.
This suggested making the hypothesis that the colors were determined by the equivalent of a coin tossing process. We proposed to test this hypothesis using the longest horizontal or vertical run of a single color as a statistic. This longest run turned out to be 17. You can see a photo of this longest run here.
In a second similar men's bathroom the longest such sequence was also 17 though the overall pattern was quite different.
Bill agreed to try to determine the distribution for the longest run by simulation while Laurie offered to try to determine the exact distribution. Bill showed the power of his new statistical package by doing his part before breakfast the next day. Laurie was still working on his assignment at the end of the meeting. You can also see Bill's simulations including the relevant graphics here.
Bill first produced a single simulation which would be the result of
tossing a coin to determine the color of each of 82,944 squares. The graphical
representation of the simulation made a floor that looked remarkably like
the floor in the bathroom. Then, assuming that the rows and columns are
independent, Bill simulated 288 rows and 288 columns and found the longest
run in any of these. While the independence assumption is not quite true
we shall see that this does not affect the outcome very much. He then simulated
this 100 times and obtained the following results for the 100 simulations:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
From this we see that 17 occurred 33 times suggesting that it is the most likely value for the maximum horizontal or vertical run in a tiling. Thus we certainly cannot reject the hypothesis that the tiling is random on the basis of this test.
Laurie started his consideration of an exact solution by looking at the longest run in the rows instead of both rows and columns. This avoids the issue of the dependence between rows and columns. For a single row the length of the longest run is the length of the longest run of heads or tails when we toss a coin 288 times.
This problem has a long history and is beautifully explained in an article by Mark Schilling, The College Mathematics Journal, Vol. 21, No 2, May 1990, pp. 196-207.
Schilling starts his article with an old activity that is still popular in statistics classes. This activity is described in a book by Revesz and Csorgo (Strong Approximations in Probability and Statistics, Academic Press 1981, p. 97). Here we find a teaching experiment of T. Varga described as follows:
The usual way to try to divide the results into their original groups is to choose those with the longest run of heads or tails 6 or greater as the ones for which a coin was tossed.His class of secondary school children is divided into two sections. In one of the sections each child is given a coin which they then throw two hundred times, recording the resulting head and tail sequence on a piece of paper. In the other section the children do not receive coins but are told instead that they should try to write down a "random" head and tail sequence of length two hundred. Collecting these slips of paper, he then tries to subdivide them into their original groups.Most of the time he succeeds quite well.
While this experiment has been carried out by many of us with great success it failed once for Bill Peterson when he was teaching an alumni class at Middlebury. Bill asked the students to do the experiment during the lunch break. When he looked at the results he suspected that the students might have tried to outsmart him. He asked one them if they had and got the answer: Age and cunning beat youth and brains!
Schilling first gives a simple recursion equation to compute the distribution for the length of the longest head run in n tosses of a coin. He lets A(n,k) be the number of sequences of length n for which the longest head run is at most k.
Consider k = 3. If n <= 3 then all sequences are favorable. If n > 3 every favorable sequence must begin with either T, HT, HHT, or HHHT corresponding to 0,1,2, or 3 heads before the first tail. This divides them favorable sequences into four groups so that
A(n,3) = A(n-1,3) + A(n-2,3) + A(n-3,3) + A(n-4,3).
Thus to compute A(n,3) for all n we start with the first four powers
of 2 and then compute successive values as the sum of the previous four
values giving:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The same procedure works for any k. You determine the first k+1 values as powers of 2 and then later value are the sum of the previous k+1 values. The probability that the longest head run is at most k is then A(n,k)/2^k and this determines the distribution of the length of the longest head run.
Now we are interested in sequences for which the longest head or tail run in n tosses of a coin is at most x. But this is the same as the probability that the longest head run in n-1 tosses of a coin is at most x-1. To see this consider the following two sequences: The first is the result of tossing a coin 10 times and the second indicates which of the tosses were the same (S) as the last toss and which are different than the last toss(D)
T H T T T H T T H H
D D S S D D S D S
Then the second sequence can also be considered as the outcome of tossing tossing a coin 9 tosses and the length of the longest S run is one less than the length of the longest head or tail run in the original sequence. Thus B(n,k) = 2*A(n-1,k-1) (the two comes from the fact reversing all the outcomes in the first sequence gives the same second sequence.) This means that the distribution of the length of the longest head or tail run in n tosses of a coin is the same as the distribution of the longest head run in n-1 tosses of a coin shifted up by one.
Schilling's recursion equation made it easy to write a Mathematica program to compute the distribution of the longest head or tail run in n tosses of a coin. Running the program for n = 200 we obtain a distribution for the length of the longest head or tail run in 200 tosses of a coin. From the distribution we find that there is about an 80 percent chance that the longest run of either heads or tails in 200 tosses is 6 or greater. This explains the success of the Varga activity.
Schilling states that, for large n, the expected length of the longest head run is approximately log(n) - 2/3 where the log is to the base 2. The standard deviation, remarkably, is essentially independent of n and is about 1.873. This means that the expected value of the longest head or tail run in n tosses of a coin is approximately log(n-1)+1/3 and the standard deviation is also approximately 1.873. For example for 200 tosses of a coin the exact values for the mean and variance of the longest head or tail run are 7.977 and 1.828 respectively, and the approximations would be 7.970 and 1.873.
Returning now to the bathroom tile problem we can us n = 288 and compute the probability F(x) that the length of the longest black or white run in a row of our floor is less than or equal to x. Now we have 288 such rows in our square. Thus the probability G(x) that all 288 rows have largest run less than or equal to x is given by F(x)^288 since the length of the longest runs in the rows are independent. From this we can find the probability G(x) that the largest black or white run in all the rows of the square is at most x. If we further assume independence of rows and columns we would get H(x) = F(x)^576 for the distribution of the longest run or black or white squares in the entire floor. Again, this is easy to compute using our Mathematica program.
To see if the assumption of independence causes problems we did our own simulation where we did not assume independence. We just made 500 tile floors making each square black or white with probability 1/2 and finding the longest run of black or white either in the rows or the columns. Here are the exact calculations assuming independence and our estimates for the probabilities from our simulation. Assuming Independence Simulated longest run prob prob estimated
Here is a comparison of the results of calculating the probability
assuming independence and simulating the length of the longest run without
assuming independence.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
From this we can see that the assumption of independence does not seem to be a problem.
Note that if we had done the simulation right in the first place without assuming indpendence we would have seen that 17 was the most probable number for the length of the longest run and we would not have to have told the Schilling story. However, it is a great story and problems related to runs occur frequently as they did in Varga's activity so if you got this far you learned something important.
DISCUSSION QUESTIONS:
(1) Bill Peterson suggested considering the rows as one long sequence by just continuing each row starting with the beginning of the next row. What would the expected length of the longest black or white run be in such a sequence?
(2) When we returned back to Dartmouth we were reminded that people
often thought the tiles of the side of the Mathematics department looked
random. After walking by them for forty years it was pointed out to us
that this is far from true. In fact it is periodic of period 3. For a photograph
of the tiles and the three parts go here.
How would you go about testing if within each third the pattern is random?
>>>>>==============>>>>>>==============>>>>>>==============>>>>>>==============>
SOURCE
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
CHANCE News 9.09
August 10, 2000 to September 12, 2000
12. Is the pattern on the tile floor random?
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!