Box Plots
Box-and-whisker diagrams, or Box Plots, use the concept of breaking a data set into fourths, or quartiles, to create a display. The box part of the diagram is based on the middle (the second and third quartiles) of the data set. The whiskers are lines that extend from either side of the box. The maximum length of the whiskers is calculated based on the length of the box. The actual length of each whisker is determined after considering the data points in the first and the fourth quartiles.
Although box-and-whisker diagrams present less information than histograms or dot plots, they do say a lot about distribution, location and spread of the represented data. They are particularly valuable because several box plots can be placed next to each other in a single diagram for easy comparison of multiple data sets.
What can it do for you?
If your improvement project involves a relatively limited amount of individual quantitative data, a box-and-whisker diagram can give you an instant picture of the shape of variation in your process. Often this can provide an immediate insight into the search strategies you could use to find the cause of that variation.
Box-and-whisker diagrams are especially valuable to compare the output of two processes creating the same characteristic or to track improvement in a single process. They can be used throughout the phases of the Lean Six Sigma methodology, but you will find box-and-whisker diagrams particularly useful in the analyze phase.
How do you do it?
1. Decide which Critical-To-Quality (CTQ) characteristic you wish to examine. This CTQ must be measurable on a linear scale. That is, the incremental value between units of measurement must be the same. For example, time, temperature, dimension and spatial relationships can usually be measured in consistent incremental units.
2. Measure the characteristic and record the results. If the characteristic is continually being produced, such as voltage in a line or temperature in an oven, or autolock box if there are too many items being produced to measure all of them, you will have to sample. Take care to ensure that your sampling is random.
3. Count the number of individual data points.
4. List the data points in ascending order.
5. Find the median value. If there are an odd number of data points, the median is the data point that is halfway between the largest and the smallest ones. (For example, if there are 35 data points, the median value is the value of the 18th data point from either the top or the bottom of the list.) If there is an even number of points, the median is halfway between the two points that occupy the center most position. (If there were 36 points, the median would be halfway between point 18 and point 19. To find the median value, add the values of points 18 and 19, and divide the result by 2.) If you think of the list of data points being divided into quarters (quartiles), the median is the boundary between the second and the third quartile.
Order Value Boundary
1 27.75
2 37.35
3 38.35
4 38.35
5 38.75
Second Quartile 39.250
6 39.75
7 40.50
8 41.00
9 41.15
10 42.55
Third Quartile 42.725
11 42.90
12 43.60
13 43.85
14 47.30
15 47.90
Fourth Quartile 48.025
16 48.15
17 49.86
18 51.25
19 51.60
20 56.00
Data table divided into quartiles
6. The next step is to find the boundaries between the first and second and the third and fourth quartiles. The first quartile boundary is halfway between the last data point in the first quartile and the first data point in the second quartile. (If one data point is on the median, that data point is considered to be the last point in the second quartile and the first point in the third quartile.) In a similar way, find the third quartile boundary, the halfway point between the last value in the third quartile and the first value in the fourth quartile.
7. Draw and label a scale line with values. The value of the scale should begin lower than your lowest value and extend higher than your highest value. The scale line may be either vertical or horizontal.
8. Using the scale as a guideline, create a box above or to the right of the scale. One end of the box will be the first quartile boundary; the other will be the third quartile boundary. (The width of the box is somewhat arbitrary. Boxes tend to be long and thin. As an option, if you have multiple data sets with different numbers of data points in each set, make the width of the boxes so that they correspond roughly with the relative quantity of data represented in each box.)