How to use the Transform Data web page

The i-chart is more sensitive to the underlying distribution

Back to program

The individuals chart (i-chart), which is the basis of rare event charts such as events between adverse event (g-chart) and time between adverse event (t-chart), is sensitive to the shape of the distribution of the measurements. Especially, when the data range over more than an order of magnitude (10, 100, 1000 and so on) or when the data are highly skewed in one direction (for example, measurements of time), the limits on the i-chart will benefit from transformation of the data.

Apart from rare events, datasets that occur in healthcare series are seen when hospital-wide data (usually suitable for Shewhart control charts) are drilled down in subsets, such as by ward, or by diagnosis, or by physician. In these cases, the denominator sizes do not meet the criteria for Shewhart control charts, and are usually skewed datasets. [2]

Extreme cases

Typical transformations for data that are highly skewed

Transformation steps

  1. Create a frequency plot of data.
  2. If the frequency plot is not symmetric, and you don't think that special causes are the dominating issue, try transformations to get data symmetric.
  3. Deal with `0` values in order to use the logarithm or reciprocal transformation. A common default is to add a small value to all the data points.
  4. Look at frequency plots of the transformed data to see if it is symmetric.
  5. Once a transformation is found where the distribution is somewhat symmetric, compare probability plots and JB test and linear regression.
  6. Complete calculations and draw graphs ( see other pages)
    1. Calculate chart limits using transformed data.
    2. Do reverse transformation of the center line and limits in order to plot the data in its original units.
    3. Plot the chart using the original data and the limits that have been transformed back into the original units.
    4. Sometimes it is more appropriate to present data on a logarithmic scaleso it is easier to visually see points below the center line.
    5. Adjustments to these estimates should be made, depending on the transformation used.
      Try Time between (gap)
      Select sample file [hai_299.txt] to see following graphs (i) original data (ii) logarithmic scale (iii) transformed data.

How to use the transformation page

The sample datasets contain rare events data as follows:

For each dataset, as soon as the selection is made, the page displays a histogram of the data distribution and a cumulative probability curve.

At this time, the original data is retained by the program (you do not have to import the original data file for each transformation), and the following transformations are executed automatically. With data in place, the visual display can be modified by choosing cyclically between histogram, cumulative probability plot, and normal probability plot. [3] This saves time over drawing each plot in Excel individually, and quickly enables a visual decision about which transformation gives the desired direction of change.

`x_i = sqrt{x_i}`
`x_i = x_i^(0.277)` (Weibull)
`x_i = x_i^0.25 `
`x_i = ln(x_i)`
`x_i = frac{1}{sqrt(x_i)} `
`x_i = frac{1}{x_i} `
`x_i = frac{1}{x_i^2} `
`x_i = arcsin(x_i)^½ `
`x_i = log (frac{x_i}{1-x_i}) `
`x_i = 0.5 times log(frac{1+x_i}{1-x_i})`

When graphs are achieved that appear to approach a bell shape but it is difficult to say which is the better fit, the program continues to display, under all the graphs, the results of caculations for kurtosis, skewness, and the JB-test. A JB-test equal to zero is the goal for a normal distribution (If calculating the p-value by Excel [4] — not supplied in this webpage — a value > 0.05 is the objective.)

After completing all the sample datasets, and discussing which transformations are effective in transforming which distributions, use your own data. Your data must be prepared in advance as calulating the gap value (number between, or time between); but beware of zero values (for example multiple falls on the same day, the difference is zero). Zero values can not be used in logarithmic or reciprocal data. Consider adding a small quantity (for example, 0.01) to the gap value in your dataset, or increasing the accuracy of the data collection, for example date and time of each fall.

References

  1. Provost LP, Murray SK. The health care data guide. Learning from data for improvement. www.amazon.com 2011. John Wiley & Sons.
  2. Hart MK, Hart RF. Statistical process control for health care. 2000 www.amazon.com
    Translated and published in Taiwan as:
    鐘國彪審閱、陳宗泰譯:「健康照護的統計流程管制」 金名圖書有限公司 www.eslite.com
  3. McNeese B. Normal probability plots www.spcforexcel.com
  4. CHISQ.DIST.RT Function: Calculates the right-tailed probability of a chi-square distribution corporatefinanceinstitute.com