Sunday, July 13, 2014

You say continuous, I say discrete

Data types are a critical factor for quality statistical analysis. This helps to understand what kind of statistical analysis can be utilized to make decisions. Making decisions is the goal of data. If data does not draw you closer to a decision it is worthless. There are many considerations that ensure that your data will be useful but one of the first is the data type. There are two primary data types that will affect the future analysis of your data. Continuous is the most common type and most useful. Continuous data is data on a continuum. This does not mean that the data extremes can be infinite but just means that the data can be subdivided into infinite number of sections. Consider the weight of my dog, say 10.1lbs. She could be 10.2lbs tomorrow and 9.8lbs next week. Discrete data is data in buckets. Consider the weight data, if we had 3 dogs, a doberman, bichon, and a corgi. If we were classifing these dogs using discrete data, we would say that the doberman is heavy, the corgi is medium and the bichon is light. One way to think of it is that the continous data gives us more information whereas the discrete data is less specific, less information. One area that is often confused is count data. Consider counting the number of ears of corn harvested in a season. It could be 12.145 ears of corn. This seems like continuous data, but don't be confused by the high numbers, it's actually discrete. It is still in buckets, 1, 2, 3, 4, and so on. However if we took the weight, it provides much more information. If you were harvesting your crop this fall and could choose between the count or the weight in grams, which would you choose? Which would give you a better estimate of the value of your crop?


To give you an idea of the amount of information or value in each type of data, lets utlize some sample size calculations to show how much data would be needed for analysis. Check out this site, http://www.fulcruminquiry.com/calculating_sample_size.htm, where you can determine sample size for discrete and continuous data sets. For equivalent confidence, a discrete data set of of 2000 would be equivalent to a continuous data set of 30. 

Data types are the tip of the iceberg for quality statistical analysis, however if you mess it up, your results will be messed up.

Wednesday, October 28, 2009

The Statistics of Six Sigma

The Statistics of Six Sigma are quite simple.  This topic is often complicated by crazy equations and debates about shifts, but the essences of Six Sigma Statistics is very basic.  The understanding of these statistics is easy to explain and understand so that everyone can have an appreciation for it's benefits.
The Six Sigma level of quality is simply based on the fact that the variation within a process is minimize to the point that the average variation is one sixth that of the specification range on each side of the average.  An example of this would be if the average ship time was 3 days and the customer expected the shipment between 1 and 5 days, the process variation about the mean would have to be less than .33 days.  At this level of quality, only 3 defects per million shipments would be experienced by the customer.
Imagine if you were able to go throughout one day and experience this level of quality.  It's easy to see that dealing with processes that perform correctly 99.9997% of the time would make life easier.  That is the essence of "Six Sigma" and the statistics of Six Sigma, making "Life Easier". 
That being said, making life easier can provide a very large competative advantage for world class corporations competing for your business.  This would be the reason for all the activity around Six Sigma