Sharpening the Axe: Performing Principal Component Analysis (PCA) in R for Modern Machine Learning

Published: 1 month ago (January 7, 2026 at 12:44 AM EST)

4 min read

Source: Dev.to

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”

— Abraham Lincoln

This quote resonates strongly with modern machine learning and data science. In real‑world projects, the majority of time is not spent on modeling, but on data preprocessing, feature engineering, and dimensionality reduction.

One of the most powerful and widely used dimensionality‑reduction techniques is Principal Component Analysis (PCA). PCA helps us transform high‑dimensional data into a smaller, more informative feature space—often improving model performance, interpretability, and computational efficiency.

In this article you will learn:

The conceptual foundations of PCA
How to implement PCA in R using modern, industry‑standard practices

Lifting the Curse with Principal Component Analysis
Curse of Dimensionality in Simple Terms
Key Insights from Shlens’ PCA Perspective
Conceptual Background of PCA
Implementing PCA in R (Modern Approach)
- Loading and Preparing the Iris Dataset
- Scaling and Standardization
- Covariance Matrix and Eigen Decomposition
- PCA with prcomp()
Understanding PCA Outputs
- Variance Explained
- Loadings and Scores
- Scree Plot and Biplot
PCA in a Modeling Workflow (Naïve Bayes Example)
Summary and Practical Takeaways

Lifting the Curse with Principal Component Analysis

A common myth in analytics is:

“More features and more data will always improve model accuracy.”

In practice, this is often false. When the number of features grows faster than the number of observations, models become:

Unstable
Harder to generalize
Prone to over‑fitting

This phenomenon is known as the curse of dimensionality. PCA helps address it by reducing dimensionality while preserving most of the informational content.

Curse of Dimensionality in Simple Terms

Adding more features can decrease model accuracy.
Model complexity grows exponentially with dimensionality.
Distance‑based and probabilistic models degrade rapidly.

Two general ways to mitigate the curse:

Collect more data – often expensive or impossible.
Reduce the number of features – the preferred, practical approach.

Dimensionality‑reduction techniques like PCA fall into the second category.

Shlens’ Perspective on PCA

In his well‑known paper, Jonathon Shlens describes PCA using a simple analogy: observing the motion of a pendulum.

If the pendulum moves in one direction but we don’t know that direction, we may need several cameras (features) to capture its motion.
PCA rotates the coordinate system so that we capture the motion with fewer, orthogonal views.

In essence, PCA:

Transforms correlated variables into uncorrelated (orthogonal) components.
Orders these components by variance explained.
Allows us to retain only the most informative components.

Conceptual Background of PCA

Assume a dataset with:

m observations
n features

Represented as an ( m \times n ) matrix A.

PCA transforms A into a new matrix A′ of size ( m \times k ) where ( k \le n ). The transformation is based on the eigen‑decomposition of the covariance matrix (or singular value decomposition).

# eigen_data$values  -> eigenvalues (variance explained)
# eigen_data$vectors -> eigenvectors (principal axes)

Implementing PCA in R (Modern Approach)

Loading and Preparing the Iris Dataset

data(iris)
df <- iris[, 1:4]   # use only numeric features
head(df)

Scaling and Standardization

df_scaled <- scale(df)   # zero‑mean, unit‑variance scaling

Covariance Matrix and Eigen Decomposition

cov_mat <- cov(df_scaled)
eigen_data <- eigen(cov_mat)

# eigen_data$values  -> eigenvalues (variance explained)
# eigen_data$vectors -> eigenvectors (principal axes)

Performing PCA with `prcomp()`

# Why prcomp()?
# • Uses singular value decomposition (SVD)
# • Numerically more stable
# • Works better for high‑dimensional data

pca_res <- prcomp(df_scaled, center = FALSE, scale. = FALSE)
summary(pca_res)

Just like sharpening the axe, investing time in feature engineering and dimensionality reduction pays off exponentially.

Our mission is “to enable businesses unlock value in data.” We do many activities to achieve that—helping you solve tough problems is just one of them. For over 20 years, we’ve partnered with more than 100 clients—from Fortune 500 companies to mid‑sized firms—to solve complex data analytics challenges.

Our services include:

Turning raw data into strategic insight.

Understanding PCA Outputs

Variance Explained

explained_variance <- pca_res$sdev^2 / sum(pca_res$sdev^2)
explained_variance

Loadings and Scores

loadings <- pca_res$rotation
scores   <- pca_res$x
head(loadings)
head(scores)

Scree Plot and Biplot

# Scree plot
plot(explained_variance, type = "b", xlab = "Principal Component",
     ylab = "Proportion of Variance Explained")

# Biplot
biplot(pca_res)

PCA in a Modeling Workflow (Naïve Bayes Example)

Split the data into training and test sets.
Apply prcomp() on the training set and retain the top k components.
Transform both training and test sets using the same rotation matrix.
Train a Naïve Bayes classifier on the reduced‑dimensional training data.
Evaluate performance on the test set.

library(e1071)   # for Naïve Bayes
set.seed(123)
train_idx <- sample(seq_len(nrow(df_scaled)), size = 0.7 * nrow(df_scaled))
train_data <- df_scaled[train_idx, ]
test_data  <- df_scaled[-train_idx, ]

pca_train <- prcomp(train_data)
k <- 2   # keep first two PCs
train_pc  <- predict(pca_train, train_data)[, 1:k]
test_pc   <- predict(pca_train, test_data)[, 1:k]

nb_model <- naiveBayes(train_pc, iris$Species[train_idx])
pred    <- predict(nb_model, test_pc)
confusionMatrix <- table(Predicted = pred, Actual = iris$Species[-train_idx])
confusionMatrix

Summary and Practical Takeaways

PCA is a cornerstone technique for tackling the curse of dimensionality.
Proper scaling of features is essential before applying PCA.
prcomp() (SVD‑based) is the preferred R function for robust PCA.
Examine variance explained to decide how many components to retain.
Integrate PCA early in the modeling pipeline to improve speed and generalization.

By sharpening the “axe” of your data—through careful preprocessing and dimensionality reduction—you set the stage for more reliable, interpretable, and efficient machine‑learning models.

Sharpening the Axe: Performing Principal Component Analysis (PCA) in R for Modern Machine Learning

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”

Table of Contents

Lifting the Curse with Principal Component Analysis

Curse of Dimensionality in Simple Terms

Shlens’ Perspective on PCA

Conceptual Background of PCA

Implementing PCA in R (Modern Approach)

Loading and Preparing the Iris Dataset

Scaling and Standardization

Covariance Matrix and Eigen Decomposition

Performing PCA with `prcomp()`

Understanding PCA Outputs

Variance Explained

Loadings and Scores

Scree Plot and Biplot

PCA in a Modeling Workflow (Naïve Bayes Example)

Summary and Practical Takeaways

Related posts

Mastering Non-Linear Data: A Guide to Scikit-Learn’s SplineTransformer

Automating machine learning with AI agents

What Actually Wins League of Legends Games? ML Analysis of 250K Matches

Building AI Agents in 2025: From ChatGPT to Multi-Agent Systems

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”

Table of Contents

Lifting the Curse with Principal Component Analysis

Curse of Dimensionality in Simple Terms

Shlens’ Perspective on PCA

Conceptual Background of PCA

Implementing PCA in R (Modern Approach)

Loading and Preparing the Iris Dataset

Scaling and Standardization

Covariance Matrix and Eigen Decomposition

Performing PCA with prcomp()

Understanding PCA Outputs

Variance Explained

Loadings and Scores

Scree Plot and Biplot

PCA in a Modeling Workflow (Naïve Bayes Example)

Summary and Practical Takeaways

Related posts

Mastering Non-Linear Data: A Guide to Scikit-Learn’s SplineTransformer

Automating machine learning with AI agents

What Actually Wins League of Legends Games? ML Analysis of 250K Matches

Building AI Agents in 2025: From ChatGPT to Multi-Agent Systems

Performing PCA with `prcomp()`