«51 high school students enrolled in a typing class were categorized according to eye color. Dark-eyed individuals are reportedly better at reactive activities and light-eyed individuals are reportedly better at self-paced activities. Speed typing was assumed to be a reactive activity. No differences were observed between light-eyed and dark-eyed individuals or between boys and girls for 1- and 5-min. speed-typing and grades.» [1]

When reading a study abstract — or a summary in a newspaper —, finding the sentence «no differences were observed between group *A* and *B*» can mean one of two things:

- there
*actually*is no difference between*A*and*B*;**or** - there
*is*some difference between*A*and*B*, but the sample size of the experiment was too small for the researcher to detect such effect.

This is why every study should state its statistical power, or in other words the probability of finding some kind of statistical evidence given that there is a difference between group *A* and *B*. Unfortunately, many papers omit this information. How do we quickly assess in our heads, then, whether the sample size was «big enough»?

The only required input for this calculation is a simple probability estimate:

Answer this question: «Imagine that there is a difference between the populations; what is the probability that a randomly selected score from one population will be greater than a randomly selected score from the other population?». Note down the number.

*In the study shown in the introduction, no confounding factor (time spent practicing, general dexterity, etc.) was taken into consideration; so I estimate the probability*P(A>B)*— if the dark-eye advantage is present, that is — being relatively low, around 60%.*Check the required sample size

*N**to obtain a reasonable Power in this table (derivation of these numbers will be discussed later):P(B>A) N* 95% 9 90% 12 85% 17 80% 25 75% 37 70% 60 65% 108 60% 247 55% 997 *In our case we had*P(B>A)=60%,*leading to a required sample size of 247.*Compare

*N**with the*N*of the study. If it is higher, our guesstimate is that the experiment was underpowered.*We calculated*N*=247*and the*N*of the study is 51, so a we should wait for additional research before affirming that «eye colour has no influence on typing speed».*

Of course this is a rough appraisal, but it has the advantage of not requiring us to fire up `R`

and — more importantly — of not forcing us to gauge difficult-to-estimate parameters like *σ*.

Assumptions:

- Power is set at 80%, α to 5% (as it is extremely common in literature);
- the standard deviations (σ₁ and σ₂) of the two populations are the same;
- we are dealing with a
*t-test*(so difference-in-means, linear regression or similar), two-tailed.

*(derivation — skip this if you only care about the formula)*

Now the only thing needed to calculate the required sample size is the effect size *θ*. *θ* is given by *|µ₂−µ₁|/σₚ*, or the difference of the means (easy to guesstimate) divided by the pooled standard deviation (much more difficult to guessimate). An easier quantity to gauge in your head is the probability of a sampled element from the second distribution being bigger than one sampled from the first one *P(B>A)*. From this we can arrive to θ:

- remember that the difference between two independent normals with the same variance is distributed as
*D ~ N(µ₁−µ₂, 2σ²)*; - notice how the cumulative
*F(0)*for*D*represents the probability*P(B>A)*; - with
*d=0*the standardisation*z*of D is*(µ₂−µ₁)/(√2×σ)*— very similar to θ; - so we check
*Φ⁻¹[P(B>A)]*, in other words the score*z*corresponding to our estimated*P(B>A)*on a standard normal table; - then
*θ ≃ 1.41×z*— without any hypotesis required for either*µ*or*σ*. □

From the last step we can build the «Probability ⟷ Required Sample Size» table (R file with the computation).