Statistics - Measures of Position In Data Science
Source: Dev.to
What Are Measures of Position?
Measures of position describe where a particular data value stands relative to the rest of the dataset.
- Is this value high, low, or typical?
- What proportion of data lies below (or above) a given value?
- How extreme is a data point?
Unlike measures of central tendency (mean, median) or dispersion (variance, standard deviation), measures of position focus on relative standing.
Why Measures of Position Matter in Data Science
In data science, measures of position are crucial for:
- Outlier detection (e.g., IQR method)
- Feature scaling and normalization
- Risk assessment (finance, insurance)
- Model evaluation (percentile‑based metrics)
- Fair comparisons across populations
Example: A test score of 85 means very different things depending on whether it is in the 60th percentile or the 95th percentile.
Types of Measures of Position
- Percentiles
- Quartiles
- Deciles
- Z‑scores (Standard Scores)
- Ranks
Percentiles
Definition
The p‑th percentile is the value below which p % of the data falls.
Properties
- Range from 0 to 100
- Not evenly spaced in value—depends on the data distribution
How to Compute Percentiles
Given ordered data of size n:
[ \text{Position of } P_p = \frac{p}{100},(n+1) ]
If the position is not an integer, interpolate between the surrounding values.
Example
Data (ordered): 10, 20, 30, 40, 50
Find the 60th percentile:
[ P_{60} = \frac{60}{100},(5+1) = 3.6 ]
The 3.6‑th position lies between the 3rd (30) and 4th (40) values:
[ 30 + 0.6,(40-30) = 36 ]
So, (P_{60}=36).
Interpretation
- 60 % of the data ≤ 36
- 40 % of the data ≥ 36
Quartiles
Quartiles divide data into four equal parts.
| Quartile | Meaning |
|---|---|
| Q1 | 25th percentile |
| Q2 | Median (50th percentile) |
| Q3 | 75th percentile |
Interquartile Range (IQR)
[ \text{IQR}=Q_3 - Q_1 ]
- Measures the spread of the middle 50 % of the data
- Robust to outliers; heavily used in box plots and anomaly detection
Outlier Detection (IQR Rule)
Values below (Q_1 - 1.5\cdot\text{IQR}) or above (Q_3 + 1.5\cdot\text{IQR}) are considered outliers.
Deciles
Deciles split data into ten equal parts.
| Decile | Corresponding Percentile |
|---|---|
| D1 | 10 % |
| D2 | 20 % |
| … | … |
| D9 | 90 % |
Applications
- Income distribution analysis
- Population studies
- Risk stratification
Example: Top 10 % income earners are those above the 9th decile.
Z‑Scores (Standard Scores)
Definition
A Z‑score measures how many standard deviations a value is from the mean:
[ Z = \frac{x - \mu}{\sigma} ]
where
- (x) = observation
- (\mu) = mean of the distribution
- (\sigma) = standard deviation
Interpretation
- Standardizes different scales, enabling comparison across datasets
- Fundamental in machine‑learning preprocessing
- Basis for normal‑distribution probabilities
Example
Mean = 70, (\sigma = 10)
[ Z = \frac{85 - 70}{10} = 1.5 ]
The score is 1.5 standard deviations above the mean.
Relationship Between Z‑Scores and Percentiles (normal distribution)
| Z | Approximate Percentile |
|---|---|
| -2 | 2.5 % |
| -1 | 16 % |
| 0 | 50 % |
| 1 | 84 % |
| 2 | 97.5 % |
This connection is vital for hypothesis testing, probability estimation, and statistical modelling.
Ranks
Definition
A rank assigns an ordinal position to each observation.
Example
- Highest score → Rank 1
- Next highest → Rank 2
Types of Ranking
- Dense ranking (1, 2, 2, 3)
- Competition ranking (1, 2, 2, 4)
- Fractional ranking (average rank for ties, e.g., 2.5)
Limitations
- Ignores magnitude differences
- Not suitable for distance‑based models
Measures of Position vs. Measures of Central Tendency
| Aspect | Central Tendency (Typical Value) | Position (Relative Standing) |
|---|---|---|
| Focus | Typical value | Relative standing |
| Examples | Mean, Median | Percentiles, Z‑scores |
| Outliers | Sensitive (mean) | Often robust |
| Use in ML | Baseline | Feature scaling, anomaly detection |
Real‑World Data Science Applications
-
Machine Learning
- Feature normalization using Z‑scores
- Quantile transformation
-
Finance
- Value‑at‑Risk (VaR) – percentile‑based
- Risk classification using deciles
-
Healthcare
- Growth percentiles (BMI‑for‑age)
- Lab result interpretation
-
Education
- Standardized test scores
- Admission cut‑offs
Summary Table
| Measure | Purpose | Robust to Outliers |
|---|---|---|
| Percentile | Relative position | Yes |
| Quartile | Spread & position | Yes |
| Decile | Distribution segmentation | Yes |
| Z‑score | Standardized distance | No |
| Rank | Order comparison | Yes |
Key Takeaways
- Measures of position explain where a value lies, not just what it is.
- Percentiles and quartiles are distribution‑free; Z‑scores assume normality but enable deep comparisons.
- In data science, they are foundational for scaling, anomaly detection, and interpretation.