Mean, Median, Mode, Variance, and Standard Deviation

 When working with data it is important to understand data, it's important to summarize and understand its overall behavior. Some of the most common statistical measures for this are Mean, Median, Mode, Variance, and Standard Deviation. Let's break them down one by one, in a simple and intuitive way.

Mean (Average) :

Mean is what most of us commonly call as "average." It tells us the central value of the data set.

Mean=Sum of all valuesNumber of values\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}

Example:
Suppose we have the data: 5, 7, 9, 10, 12
Then, the mean is:

Mean=5+7+9+10+125=435=8.6\text{Mean} = \frac{5 + 7 + 9 + 10 + 12}{5} = \frac{43}{5} = 8.6

When to use:
When the data is relatively evenly distributed without extreme outliers.


Median (Middle Value) :

Median is the middle number when the data is arranged in order.

  • If there are odd numbers of values, the median is the center value.

  • If there are even numbers of values, it’s the average of the two center values.

Example:
1. Data: 3, 8, 9, 15, 20
Median = 9 (middle value)

2. Data: 2, 4, 6, 8
Median = (4 + 6)/2 = 5

When to use:
When our data has outliers or is skewed, median gives a better idea of the "typical" value.


Mode (Most Frequent Value) :

Mode is the value that appears most frequently in the data set.

Example:
Data: 2, 4, 4, 4, 6, 7, 8
Mode = 4 (since 4 appears three times)

Note - A dataset can have no mode, one mode, or multiple modes.

When to use:
When we want to know what is most common or frequent in our dataset.


Variance :

Variance measures how spread out the numbers are, in the data set. It calculates the average of the squared differences from the mean.

Variance(σ2)=(xiμ)2N\text{Variance} (\sigma^2) = \frac{\sum (x_i - \mu)^2}{N}

Where:

  • xix_i = each value

  • μ\mu = mean of the data

  • NN = number of values

Example: Suppose the data is 2, 4, 6.

  • Mean = (2+4+6)/3 = 4

  • Squared differences = (2-4)², (4-4)², (6-4)² => 4, 0, 4

  • Variance = (4+0+4)/3 = 8/3 = 2.67

When to use:
When we want to understand how much the data varies from the mean.


Standard Deviation

Standard deviation is simply the square root of the variance. It gives the spread of data points in the same units as the data itself.

Standard Deviation(σ)=Variance

Using the above example where variance is 2.67:

Standard Deviation=2.67  = 1.63

When to use:
When we want to easily interpret how much the data deviates from the average in the same units as the data.


Summary Table

Concept                         Meaning                          When to Use
Mean    Average value        When data is evenly distributed
Median            Middle value        When data has outliers
Mode    Most frequent value        To find most common value
Variance    Measure of spread        To see variability
Standard Deviation    Root of variance        To interpret spread in original units


My Thoughts 

Understanding these five basic statistics forms the foundation for many advanced topics in data science, machine learning, and research. As we explore real-world datasets, choosing the right measure can completely change our interpretation of the data!

Comments

Popular posts from this blog

Understanding Probability Distributions: Normal, Binomial, Poisson, and Bernoulli

Introduction to Probability Theory: Independence, Conditional Probability, and Bayes’ Theorem