Feature Engineering

Published: (December 17, 2025 at 01:57 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

What is Feature Engineering?

  • A feature is just a column of data (e.g., age, salary, number of purchases).
  • Feature engineering means creating, modifying, or selecting the right features so that your model learns better.
  • Think of it as preparing ingredients before cooking—you want them clean, chopped, and ready to make the dish tasty.

Why Do We Need It?

  • Raw data is often messy, incomplete, or not in the right format.
  • Good features help algorithms see patterns more clearly.
  • Better features → better predictions, faster training, and more accurate results.

Common Techniques in Feature Engineering

TechniqueWhat It MeansSimple Example
Handling Missing ValuesFill in blanks or remove incomplete dataReplace missing ages with the average age
Encoding Categorical DataConvert text labels into numbers“Red, Blue, Green” → 0, 1, 2
Scaling / NormalizationPut numbers on similar rangesSalary (₹10,000–₹1,00,000) scaled to 0–1
Feature CreationCombine or transform existing data into new featuresFrom “Date of Birth” → create “Age”
Feature SelectionKeep only the most useful featuresDrop irrelevant columns like “User ID”
BinningGroup continuous values into categoriesAge 0–12 = Child, 13–19 = Teen, 20+ = Adult

Simple Example

Imagine you have this dataset:

NameDate of BirthSalaryCity
Alice1995-06-1250,000Delhi
Bob1988-03-0580,000Mumbai

After feature engineering:

  • Age is calculated from Date of Birth.
  • City is encoded as numbers (Delhi = 0, Mumbai = 1).
  • Salary is scaled between 0 and 1.

The data is now cleaner and easier for the model to understand.

Key Takeaways

  • Feature engineering = preparing and improving data features.
  • It makes models smarter and predictions more accurate.
  • Core techniques include handling missing values, encoding, scaling, creating new features, and selecting the best ones.

Feature Engineering in Python

Make sure you have pandas and scikit-learn installed:

pip install pandas scikit-learn
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler

# Example dataset
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Date_of_Birth': ['1995-06-12', '1988-03-05', '2000-12-20'],
    'Salary': [50000, 80000, None],   # Missing value
    'City': ['Delhi', 'Mumbai', 'Delhi']
}

df = pd.DataFrame(data)
print("Original Data:\n", df)

# 🔹 Handling Missing Values
df['Salary'].fillna(df['Salary'].mean(), inplace=True)

# 🔹 Feature Creation (Age from Date of Birth)
df['Date_of_Birth'] = pd.to_datetime(df['Date_of_Birth'])
df['Age'] = pd.Timestamp.now().year - df['Date_of_Birth'].dt.year

# 🔹 Encoding Categorical Data (City)
label_encoder = LabelEncoder()
df['City_encoded'] = label_encoder.fit_transform(df['City'])

# 🔹 Scaling Numerical Data (Salary)
scaler = MinMaxScaler()
df['Salary_scaled'] = scaler.fit_transform(df[['Salary']])

print("\nAfter Feature Engineering:\n", df)

What This Code Does

  • Handles missing values by filling in the average salary.
  • Creates a new feature (Age) from Date_of_Birth.
  • Encodes categorical data (City) into numbers.
  • Scales numerical data (Salary) between 0 and 1.

Final Note

Think of feature engineering like polishing a diamond. The raw stone (data) is valuable, but shaping and refining it (features) unlocks its true brilliance.

Back to Blog

Related posts

Read more »