Mean, Median, Mode, Variance, and Standard Deviation
When working with data it is important to understand data, it's important to summarize and understand its overall behavior. Some of the most common statistical measures for this are Mean, Median, Mode, Variance, and Standard Deviation. Let's break them down one by one, in a simple and intuitive way.
Mean (Average) :
Mean is what most of us commonly call as "average." It tells us the central value of the data set.
Example:
Suppose we have the data: 5, 7, 9, 10, 12
Then, the mean is:
When to use:
When the data is relatively evenly distributed without extreme outliers.
Median (Middle Value) :
Median is the middle number when the data is arranged in order.
-
If there are odd numbers of values, the median is the center value.
-
If there are even numbers of values, it’s the average of the two center values.
Example:
1. Data: 3, 8, 9, 15, 20
Median = 9 (middle value)
2. Data: 2, 4, 6, 8
Median = (4 + 6)/2 = 5
When to use:
When our data has outliers or is skewed, median gives a better idea of the "typical" value.
Mode (Most Frequent Value) :
Mode is the value that appears most frequently in the data set.
Example:
Data: 2, 4, 4, 4, 6, 7, 8
Mode = 4 (since 4 appears three times)
Note - A dataset can have no mode, one mode, or multiple modes.
When to use:
When we want to know what is most common or frequent in our dataset.
Variance :
Variance measures how spread out the numbers are, in the data set. It calculates the average of the squared differences from the mean.
Where:
-
= each value
-
= mean of the data
-
= number of values
Example: Suppose the data is 2, 4, 6.
-
Mean = (2+4+6)/3 = 4
-
Squared differences = (2-4)², (4-4)², (6-4)² => 4, 0, 4
-
Variance = (4+0+4)/3 = 8/3 = 2.67
When to use:
When we want to understand how much the data varies from the mean.
Standard Deviation
Standard deviation is simply the square root of the variance. It gives the spread of data points in the same units as the data itself.
Using the above example where variance is 2.67:
When to use:
When we want to easily interpret how much the data deviates from the average in the same units as the data.
Summary Table
Concept Meaning When to Use| Mean | Average value | When data is evenly distributed |
| Median | Middle value | When data has outliers |
| Mode | Most frequent value | To find most common value |
| Variance | Measure of spread | To see variability |
| Standard Deviation | Root of variance | To interpret spread in original units |
My Thoughts
Understanding these five basic statistics forms the foundation for many advanced topics in data science, machine learning, and research. As we explore real-world datasets, choosing the right measure can completely change our interpretation of the data!
Comments
Post a Comment