- slide 1 of 1
About Data Types
Charting data and performing statistical analysis are a key component of a Six Sigma project, especially during the Measure and Improve phases of DMAIC. The type of data that you have determines the types of statistical tests, control charts and other visuals that you can use, so understanding the different data types is critical. Using the wrong test or chart can lead to inaccurate conclusions and decrease the likelihood of uncovering and countering root causes.
The two main categories of data are continuous and discrete, with the discrete type having two main subtypes. Common continuous measures are time, money and any physical measurement such as weight, height, length or temperature. As the name implies, data points lie along a continuous scale, with any value theoretically possible (even if your process isn't capable of achieving that level).
Discrete data are countable in the sense that you can count how many of something there are. You can count items with a specific characteristic (attribute data), or you can count the number of occurrences of an event or incident (count data).
The first type of discrete data is attribute data. Think of attributes as a way of categorizing or bucketing things. An animal is a cat, dog, rabbit or gerbil. A product ordered is a CD, MP3 file or DVD. Not only can you count how many items have a certain attribute, you can also count how many items do not. This can be converted to a percentage, for instance a percentage of total product sold that is the premium version of the product. (Attributes are premium and non-premium and each product is bucketed into one or the other category.)
Contrast this with the other type of data: count data. When you are dealing with count data, you can literally count the number of occurrences or instances of that problem or characteristic but you can't count a non-occurrence. For example, you can count lighting strikes but can not count non-occurrences of lightning. (Oh, there wasn't a lightning bolt! There wasn't another one!) You also can not determine a proportion as you can with attribute data, as there is an infinite number of opportunities for lightning strikes. You can not say that lightning struck a certain number of times out of a total of times it did and didn't strike.
This brings up an interesting point. It is possible to collect and compile data in a way that lets you determine whether you will work with attribute or count data. For instance, while you can not count non-occurrences of lightning strikes, however you could instead analyze your data in categorical terms by tracking whether or not there is a lightning strike in each hour-long period or in each city or on each day. Then you would have an attribute measure, as you could categorize each hour, city or day as having or not having lightning and you could calculate the percentage of hours, cities or days in which lightning did occur.
It is also possible to use either continuous or discrete measures for the same type of data. If you are interested in the cycle time for completing the paperwork to bring on a new client, you can track the amount of time it takes in each instance, which would be a continuous measure. Or you could categorize each instance as either meeting or not meeting the maximum cycle time that the company intends to have or that customers have indicated as their definition of quality. This is essentially tracking the number of defects, which is quite common in Six Sigma data analysis.
Common attribute defect measures include defective products, late deliveries and dissatisfied customers. Common count defect measures include injuries, errors on a report, customer complaints and insurance claims.