The model for improvement and its Plan–Do–Study–Act (PDSA) cycles typically require frequent data collection to test ideas and refine the planned change strategy. The perception that data collection must involve many patients can lead to insufficiently frequent PDSA cycles and act as a barrier to initiating local improvement activities.
Small samples for demonstrating local gaps in care
How is it possible that such small samples permit rejecting the null hypothesis, while properly designed controlled clinical trials need to enrol hundreds or thousands of patients? Table 1
shows the sample size requirements for local quality audits. Table 1 can be used in two ways:
- First, on completing an audit, the table can quickly indicate if your result is statistically significant. For example, if your audit showed an observed system performance of 50% when the desired system performance is 80%, then an audit with a sample size of 12 or more will be statistically significant.
- Second, you can use this table to plan a sample size for an audit or PDSA cycle. For example, if your "hunch" is that the observed system performance will be 50%, and you have a desired system performance of 90%, then a sample size as low as 6 will likely suffice (though there is no harm in planning to include a few additional observations to ensure that you have a sample that represents your system's usual performance [External validity]).
How can small samples be statistically valid?
How is it possible that such small samples permit rejecting the null hypothesis, while properly designed controlled clinical trials need to enrol hundreds or thousands of patients?
- One reason is that we are looking at very large differences (eg, 50% vs 80%), whereas clinical trials typically look for much smaller differences. As shown in table 1, as the observed performance comes closer to the desired target larger sample sizes are required to show significant differences. For example, you would need an audit sample size of 280 to show that 75% observed performance differed significantly from a desired performance of 80%.
- A second reason for the surprisingly small sample sizes shown in table 1 is that clinical researchers want a precise estimate of treatment effect, whereas in local audits, the precision of the estimate of system performance is less important. In an example audit, 10/20 (50%) of charts had successful medication reconciliation. Statisticians use 95% CIs to describe the precision of study results; our audit has a 95% CI that extends from a low of 28% to a high of 72%. But, this result suffices to conclude that our local system performance falls short of 80%. We are less concerned about whether the actual performance is 28% or 72%, because both are unacceptable.
Table 1. Minimum sample sizes required for improvement projects based on observed and desired system performance.
Observed system performance (%) |
Desired system performance |
80%
|
90%
|
95 |
26 |
140 |
90 |
70 |
n/a |
85 |
260 |
180 |
80 |
n/a |
50 |
75 |
280 |
28 |
70 |
80 |
20 |
66 |
45 |
15 |
60 |
25 |
10 |
50 |
12 |
6 |
40 |
10 |
5 |
20 |
5 |
5 |
Handle small samples with care
You must have an extremely high level of confidence in the data integrity of your small sample. For small sample sizes, a 'few specific patients' can amount to a large proportion of the sample. One patient represents a substantial contribution to a sample of eight patients. So, the 'catch' to using small samples is the need to follow very clear steps for collecting the data. Apply five steps:
Using small samples in PDSA cycle
Suppose that the medication reconciliation audit wants local improvement and the first change concept consists of a new medication reconciliation form that must be completed by the ordering provider. For your first PDSA cycle, you plan to obtain feedback from users about the form's usability. Your main study measure is whether the clinicians can complete the form without your help. How many clinicians should you study in this cycle?
You can use table 1 to plan your first PDSA. At this early stage you will likely be recruiting friendly highly motivated clinicians (a 'convenience sample') to try out your form. You should aim for at least a 90% success rate for completing the form without any difficulty. You do not want to implement a form that requires training and personalised support for highly motivated users. Therefore, you will use the third column from table 1 with desired system performance of 90%. Next, you need a hunch about how good you can really expect your form to be in this first go-around. You should be humble, because at early stages nothing works out as intended. Let's estimate that 60% of clinicians will be able to complete the form without personalised help or difficulty. Therefore, a sample size of 10 should be sufficient. In other words, if, as you suspect, only 60% of your convenience sample will complete the form without help, you will only need observations to show that you are not yet at your target of 90% success.
For this first (convenience) sample of 10 volunteer users, 5/10 (50%) completed the form without any input or instructions. The other five became frustrated and gave up. Table 1 tells you that, with an observed success rate of 50% and a desired target of 90%, any audit with a sample of eight or more allows you to confidently reject the null hypothesis that your form is working at a 90% success rate. In other words, your form needs work!
The quantitative element of the first PDSA cycle is already finished. You should obtain qualitative feedback from your 10 participants (especially the five motivated users who could not complete the form) and make the necessary changes. Then you can start a second PDSA cycle next week.
Example Practice
For the example above of designing a form for medication reconciliation, use the online calculator (reference #3) to calculate an exact P value for the probability that you would observe a performance of only 50% (5/10) if the required performance were 90%. Also calculate the 95% CI for your result.
- Choose "Probabilities > Binomial Probabilities"
Enter
n = 10, k = 5, p = 0.9
Hit the Calculate button
Answer: Method 1. exact binomial calculation
→
P = 0.0016349374 (0.002)
Interpretation: P is much less than the usual P<0.05, so the difference is statistically significant, and rejects the null hypothesis that there is no difference between the observed and desired result. In other words, the project did not achieve the desired result, and the difference was statistically significant.
- Choose "Proportions > The Confidence Interval of a Proportion"
Enter k = 5, n = 10
Hit the Calculate button
Answer: "95% confidence interval: including continuity correction"
→
Lower limit = 0.2014
Upper limit = 0.7986
Answer: CI: 20%–80%
(0.2014 ~ 0.7986).
Interpretation: Even though the range is very wide (CI: 20%~80%) it is not so important in a quality improvement project because both lower and upper limits are unacceptable. The CI does reach, or include, the required level of 90%.