Small sample sizes in rapid-cycle quality improvement projects.

Small sample sizes can be statistically valid.

The model for improvement and its Plan–Do–Study–Act (PDSA) cycles typically require frequent data collection to test ideas and refine the planned change strategy. The perception that data collection must involve many patients can lead to insufficiently frequent PDSA cycles and act as a barrier to initiating local improvement activities.

Small samples for demonstrating local gaps in care

How is it possible that such small samples permit rejecting the null hypothesis, while properly designed controlled clinical trials need to enrol hundreds or thousands of patients? Table 1 shows the sample size requirements for local quality audits. Table 1 can be used in two ways:

First, on completing an audit, the table can quickly indicate if your result is statistically significant. For example, if your audit showed an observed system performance of 50% when the desired system performance is 80%, then an audit with a sample size of 12 or more will be statistically significant.
Second, you can use this table to plan a sample size for an audit or PDSA cycle. For example, if your "hunch" is that the observed system performance will be 50%, and you have a desired system performance of 90%, then a sample size as low as 6 will likely suffice (though there is no harm in planning to include a few additional observations to ensure that you have a sample that represents your system's usual performance [External validity]).

How can small samples be statistically valid?

How is it possible that such small samples permit rejecting the null hypothesis, while properly designed controlled clinical trials need to enrol hundreds or thousands of patients?

One reason is that we are looking at very large differences (eg, 50% vs 80%), whereas clinical trials typically look for much smaller differences. As shown in table 1, as the observed performance comes closer to the desired target larger sample sizes are required to show significant differences. For example, you would need an audit sample size of 280 to show that 75% observed performance differed significantly from a desired performance of 80%.
A second reason for the surprisingly small sample sizes shown in table 1 is that clinical researchers want a precise estimate of treatment effect, whereas in local audits, the precision of the estimate of system performance is less important. In an example audit, 10/20 (50%) of charts had successful medication reconciliation. Statisticians use 95% CIs to describe the precision of study results; our audit has a 95% CI that extends from a low of 28% to a high of 72%. But, this result suffices to conclude that our local system performance falls short of 80%. We are less concerned about whether the actual performance is 28% or 72%, because both are unacceptable.

Table 1. Minimum sample sizes required for improvement projects based on observed and desired system performance.
Observed system performance (%)	Desired system performance
Observed system performance (%)	80%	90%
95	26	140
90	70	n/a
85	260	180
80	n/a	50
75	280	28
70	80	20
66	45	15
60	25	10
50	12	6
40	10	5
20	5	5

Handle small samples with care

You must have an extremely high level of confidence in the data integrity of your small sample. For small sample sizes, a 'few specific patients' can amount to a large proportion of the sample. One patient represents a substantial contribution to a sample of eight patients. So, the 'catch' to using small samples is the need to follow very clear steps for collecting the data. Apply five steps:

1. Define the eligible sample: we identified consecutive patients admitted to our inpatient medical service at General Hospital.
2. Establish exclusion criteria: we excluded patients who were admitted for <12h.
3. State the study period: the audit occurred from Saturday 7 November 2015 at 08:00h to Sunday 8 November 2015 at 16:00h.
4. Keep a reject log: we identified 23 consecutive admitted patients during the audit period. We excluded two patients who were discharged within 12h, leaving 21 patients for the audit.
5. Make data collection complete: we completed data collection for all 20 patients. One chart could not be located.

For a small sample medication reconciliation audit:

First, you should define your eligible sample. For audits, you should aim to enrol consecutive eligible patients. Random samples are ideal, but needlessly complex and impractical for most local improvement initiatives. For early PDSA cycles, where the focus shifts to changing provider and system performance, it is practical to use convenience samples.
1. A convenience sample is, essentially, 'whoever you can get'. (For example, we used friendly volunteer clinicians for our first PDSA cycle of our medication reconciliation form.) However, changes will usually perform better in convenience samples, who are generally highly selected to be motivated and willing to change.
2. Therefore, once your change seems to be working at the desired level, you should conduct an audit using consecutive, unselected providers whenever possible.
3. Of course you could also deliberately sample clinicians who are resistant to change and vocally opposed to your initiative.
Keep track of patients who were excluded ('reject log'). In the example, there were 23 potentially eligible patients during the study period, but two were excluded because they were admitted for <12h. This left exactly 21 patients for the audit.
The paramount concern then becomes completeness of data collection for these 21 patients. Suppose there were actually 21 patients eligible for the audit, but one chart was missing. We found that medication reconciliation occurred in 10/20 patients, but we do not know the one missing result. Therefore, the true results of our audit could have been 10/21 (48%, 95% CI 27% to 69%) or 11/21 (52%, 95% CI 31% to 73%). The incomplete data collection does not substantially alter our interpretation of the audit results, since the 95% CI would not include our target of 80% no matter what the outcome of the audit on the missing chart.
By contrast, suppose there were 40 patients eligible for the audit, but 20 charts were missing. We found medication reconciliation in 10/20 of the remaining charts. What is the result of our audit now? The answer is: we don't know. The actual result of our small audit could be as poor as 10/40 (25%, 95% CI 12% to 38%) or as high as 30/40 (75%, 95% CI 62% to 88%). Because of our sloppy methods, we can conclude that our observed system performance is somewhere between 12% and 88%, making the entire exercise useless.

Using small samples in PDSA cycle

Suppose that the medication reconciliation audit wants local improvement and the first change concept consists of a new medication reconciliation form that must be completed by the ordering provider. For your first PDSA cycle, you plan to obtain feedback from users about the form's usability. Your main study measure is whether the clinicians can complete the form without your help. How many clinicians should you study in this cycle?

You can use table 1 to plan your first PDSA. At this early stage you will likely be recruiting friendly highly motivated clinicians (a 'convenience sample') to try out your form. You should aim for at least a 90% success rate for completing the form without any difficulty. You do not want to implement a form that requires training and personalised support for highly motivated users. Therefore, you will use the third column from table 1 with desired system performance of 90%. Next, you need a hunch about how good you can really expect your form to be in this first go-around. You should be humble, because at early stages nothing works out as intended. Let's estimate that 60% of clinicians will be able to complete the form without personalised help or difficulty. Therefore, a sample size of 10 should be sufficient. In other words, if, as you suspect, only 60% of your convenience sample will complete the form without help, you will only need observations to show that you are not yet at your target of 90% success.

For this first (convenience) sample of 10 volunteer users, 5/10 (50%) completed the form without any input or instructions. The other five became frustrated and gave up. Table 1 tells you that, with an observed success rate of 50% and a desired target of 90%, any audit with a sample of eight or more allows you to confidently reject the null hypothesis that your form is working at a 90% success rate. In other words, your form needs work!

The quantitative element of the first PDSA cycle is already finished. You should obtain qualitative feedback from your 10 participants (especially the five motivated users who could not complete the form) and make the necessary changes. Then you can start a second PDSA cycle next week.

Example Practice

For the example above of designing a form for medication reconciliation, use the online calculator (reference #3) to calculate an exact P value for the probability that you would observe a performance of only 50% (5/10) if the required performance were 90%. Also calculate the 95% CI for your result.

Choose "Probabilities > Binomial Probabilities"
Enter n = 10, k = 5, p = 0.9
Hit the Calculate button
Answer: Method 1. exact binomial calculation →
P = 0.0016349374 (0.002)
Interpretation: P is much less than the usual P<0.05, so the difference is statistically significant, and rejects the null hypothesis that there is no difference between the observed and desired result. In other words, the project did not achieve the desired result, and the difference was statistically significant.
Choose "Proportions > The Confidence Interval of a Proportion"
Enter k = 5, n = 10
Hit the Calculate button
Answer: "95% confidence interval: including continuity correction" →
Lower limit = 0.2014
Upper limit = 0.7986
Answer: CI: 20%–80%
(0.2014 ~ 0.7986).
Interpretation: Even though the range is very wide (CI: 20%~80%) it is not so important in a quality improvement project because both lower and upper limits are unacceptable. The CI does reach, or include, the required level of 90%.

References

Etchells E, Ho M, Shojania KG. Value of small sample sizes in rapid-cycle quality improvement projects.
[ BMJ Qual Saf ] 2016; 25(3): 202-206.
Etchells E, Woodcock T. Value of small sample sizes in rapid-cycle quality improvement projects 2: assessing fidelity of implementation for improvement interventions.
[ BMJ Qual Saf ] 2018; 27(1): 61-65.
Lowry R. VassarStats: Website for Statistical Computation
[ vassarstats.net ]
Perla RJ, Provost LP, Muray SK. Sampling considerations for health care improvement.
[ Qual Manag Health Care ] 2014; 23(4): 268-279.
Perla RJ, Provost LP. Judgment sampling: a health care improvement perspective.
[ Qual Manag Health Care ] 2012; 21(3): 170-176.

Accept Cookies?
Provided by Web design, Gloucester