What is it?
The standard deviation provides a measure of variability in a sample or population. It is the average distance, or deviation, a point in a population (sample) is from the population (sample) mean. The standard deviation is the square root of the variance which means it is the same units of measurement as the data. This give a more interpretable picture of the spread of the data than the variance.
NOTE: Anytime I list ‘population (sample)’ this just means it can apply to a sample or a population.
What is it used for?
Standard deviation is also a building block for z-scores which provide a metric to compare variability in samples from different populations. Z-scores can also be used for calculating the probability of randomly selecting a sample with a specific value from a population. I will go into further detail in future posts.
Symbols & Formulas
σ - Symbol Name (Greek Letter): Sigma,
Parameter Name: Population Standard Deviation
s - Statistic Name: Sample Standard Deviation
Population Standard Deviation
Sample Standard Deviation
n - number of scores in a sample
N - number of scores in a population
X - individual score in a population
μ - Population Mean
X̄ - Sample Mean
When Should I Use This?
It should be used to ‘get a sense of your data’, in particular the spread of your data. It is also one of the parameters of the normal distribution along with the mean.
Not sure there is a blanket time not to use it (at least in the case of exploratory analysis). But there are some things you should know (see next section).
Assumptions, Prerequisites, and Pitfalls
In the Formulas section above you may have noticed the difference in the denominators. I won’t go into detail about bias in estimators here, but know that when estimating the sample standard deviation ‘(n-1)’ is used in place of ‘n’ for an unbiased estimation of standard deviation in the sample. For more info check out this post.
Example - No Code
Example - Code
Using statistics library
Using sample standard deviation formula