# Scatter Diagrams

The scatter diagram is another visual display of data. It shows the association between two variables acting continuously on the same item. The scatter diagram illustrates the strength of the *correlation* between the variables through the slope of a line. This correlation can point to, but does not prove, a *causal *relationship. Therefore, it is important not to rush to conclusions about the relationship between variables as there may be another variable that modifies the relationship. For example, analyzing a scatter diagram of the relationship between weight and height would lead one to believe that the two variables are related. This relationship, however, does not mean causality; for instance, while growing taller may cause one to weigh more, gaining weight does not necessarily indicate that one is growing taller. The scatter diagram is easy to use, but should be interpreted with Points to Remember as the scale may be too small to see the relationship between variables, or confounding factors may be involved.

Here is an example from Niger showing the association between mean competence score and quality improvement team stability from a sample of 20 teams.

**When to Use a Scatter Diagram**

Scatter diagrams make the relationship between two continuous variables stand out visually on the page in a way that the raw data cannot. Scatter diagrams may be used in examining a cause-and-effect relationship between continuous measurement data. They can also show relationships between two effects to see if they might stem from a common cause or serve as surrogates for each other. They can also be used to examine the relationship between two causes.

**How to Use a Scatter Diagram**

Scatter diagrams are easy to construct using programs such as Excel or Stata.

**Step 1.** Collect at least 40 paired data points: "paired" data are measures of both the cause being tested and its supposed effect at one point in time.

**Step 2.** Create a grid, with the "cause" on the horizontal axis and the "effect" on the vertical axis.

**Step 3.** Determine the lowest and highest value of each variable and mark the axes accordingly.

**Step 4.** Plot the paired points on the diagram. If there are multiple pairs with the same value, draw as many circles around the point as there are additional pairs with those same values.

**Step 5.** Identify and classify the pattern of association using the graphs at right showing possible shapes and interpretations.

**Points to Remember**

Stratifying the data in different ways can make patterns appear or disappear. When experimenting with different stratifications and their effects on the scatter diagram, label how the data are stratified so the team can discuss the implications.

Interpretation can be limited by the scale used. If the scale is too small and the points are compressed, then a pattern of correlation may appear differently. Determine the scale so that the points cover most of the range of both axes and both axes are about the same length.

Be careful of the effects of confounding factors. Sometimes the correlation observed is due to some cause other than the one being studied. If a confounding factor is suspected, then stratify the data by it. If it is truly a confounding factor, then the relationship in the diagram will change significantly.

Avoid the temptation to draw a line roughly through the middle of the points. This can be misleading. A true regression line is determined mathematically. Consult a statistical expert or text prior to using a regression line.

Scatter diagrams show relationships, but do not prove that one variable causes the other.