Feature Engineering

Published: 1 month ago (December 17, 2025 at 01:57 AM EST)

2 min read

Source: Dev.to

Source: Dev.to

What is Feature Engineering?

A feature is just a column of data (e.g., age, salary, number of purchases).
Feature engineering means creating, modifying, or selecting the right features so that your model learns better.
Think of it as preparing ingredients before cooking—you want them clean, chopped, and ready to make the dish tasty.

Why Do We Need It?

Raw data is often messy, incomplete, or not in the right format.
Good features help algorithms see patterns more clearly.
Better features → better predictions, faster training, and more accurate results.

Common Techniques in Feature Engineering

Technique	What It Means	Simple Example
Handling Missing Values	Fill in blanks or remove incomplete data	Replace missing ages with the average age
Encoding Categorical Data	Convert text labels into numbers	“Red, Blue, Green” → 0, 1, 2
Scaling / Normalization	Put numbers on similar ranges	Salary (₹10,000–₹1,00,000) scaled to 0–1
Feature Creation	Combine or transform existing data into new features	From “Date of Birth” → create “Age”
Feature Selection	Keep only the most useful features	Drop irrelevant columns like “User ID”
Binning	Group continuous values into categories	Age 0–12 = Child, 13–19 = Teen, 20+ = Adult

Simple Example

Imagine you have this dataset:

Name	Date of Birth	Salary	City
Alice	1995-06-12	50,000	Delhi
Bob	1988-03-05	80,000	Mumbai

After feature engineering:

Age is calculated from Date of Birth.
City is encoded as numbers (Delhi = 0, Mumbai = 1).
Salary is scaled between 0 and 1.

The data is now cleaner and easier for the model to understand.

Key Takeaways

Feature engineering = preparing and improving data features.
It makes models smarter and predictions more accurate.
Core techniques include handling missing values, encoding, scaling, creating new features, and selecting the best ones.

Feature Engineering in Python

Make sure you have pandas and scikit-learn installed:

pip install pandas scikit-learn

import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler

# Example dataset
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Date_of_Birth': ['1995-06-12', '1988-03-05', '2000-12-20'],
    'Salary': [50000, 80000, None],   # Missing value
    'City': ['Delhi', 'Mumbai', 'Delhi']
}

df = pd.DataFrame(data)
print("Original Data:\n", df)

# 🔹 Handling Missing Values
df['Salary'].fillna(df['Salary'].mean(), inplace=True)

# 🔹 Feature Creation (Age from Date of Birth)
df['Date_of_Birth'] = pd.to_datetime(df['Date_of_Birth'])
df['Age'] = pd.Timestamp.now().year - df['Date_of_Birth'].dt.year

# 🔹 Encoding Categorical Data (City)
label_encoder = LabelEncoder()
df['City_encoded'] = label_encoder.fit_transform(df['City'])

# 🔹 Scaling Numerical Data (Salary)
scaler = MinMaxScaler()
df['Salary_scaled'] = scaler.fit_transform(df[['Salary']])

print("\nAfter Feature Engineering:\n", df)

What This Code Does

Handles missing values by filling in the average salary.
Creates a new feature (Age) from Date_of_Birth.
Encodes categorical data (City) into numbers.
Scales numerical data (Salary) between 0 and 1.

Final Note

Think of feature engineering like polishing a diamond. The raw stone (data) is valuable, but shaping and refining it (features) unlocks its true brilliance.

Feature Engineering

What is Feature Engineering?

Why Do We Need It?

Common Techniques in Feature Engineering

Simple Example

Key Takeaways

Feature Engineering in Python

What This Code Does

Final Note

Related posts

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The Machine Learning “Advent Calendar” Day 21: Gradient Boosted Decision Tree Regressor in Excel

The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel

The Machine Learning “Advent Calendar” Day 18: Neural Network Classifier in Excel