How I Began My Data Science Journey with R in the Last Month

Published: 1 month ago (December 9, 2025 at 11:49 PM EST)

3 min read

Source: Dev.to

Introduction

Over the past month I decided to dive seriously into data science with one clear mission: learn how to analyze real data using R like a professional.
To challenge myself I tackled a complete e‑commerce analytics project. It was demanding, sometimes frustrating, but incredibly rewarding. Below is what I learned, how I progressed, and why this one‑month experience became a turning point in my journey.

Getting Started with R

At first R looked unusual and a bit intimidating, but once I started using the right libraries everything became more natural:

dplyr for data manipulation
ggplot2 for visualization
readxl and read.csv for importing data
forecast for my first time‑series predictions

Writing pipelines with the pipe operator %>% even became enjoyable—it felt like guiding the computer step‑by‑step through a clear thought process.

Organizing the Project

A major lesson: good organization matters. I created separate scripts for each step of the analysis:

data_import_cleaning.R – data import & cleaning
sales_analysis.R – sales analysis
product_insights.R – product insights
customer_segmentation.R – customer segmentation
seller_performance.R – seller performance
logistics_delivery.R – logistics & delivery
service_quality.R – service quality
predictions.R – predictions
visualizations.R – visualizations

and a main controller script main.R.
This approach mirrors how professional data analysts build reproducible workflows.

Data Cleaning Challenges

The project involved a variety of messy issues:

Inconsistent date formats
Numeric values stored as text with commas
Inconsistent region names
Missing values
Merging multiple data sources

Fixing these problems gave me a deeper sense of how real datasets behave and how to make them usable.

Analysis Performed

Once the data was clean, I explored:

Monthly, quarterly, and yearly revenue
Top‑selling products
Customer segmentation (premium, standard, occasional)
Seller performance
Delivery delays
Service quality
Correlation between delivery delay and cancellations

Visualizations

I created a range of charts to reveal the story hidden in the data:

Line plots
Bar plots
Scatter plots
Heatmaps

Seasonal patterns emerged, certain categories dominated, and long delays clearly led to more cancellations. The numbers transformed into actionable insights.

Time‑Series Forecasting

Exploring forecasting with auto.arima() was one of the most rewarding parts. I transformed the monthly revenue into a time series and predicted the next quarter:

library(forecast)

# Convert monthly revenue to a ts object
revenue_ts <- ts(monthly_revenue, start = c(2023, 1), frequency = 12)

# Fit ARIMA model
model <- auto.arima(revenue_ts)

# Forecast next quarter (3 months)
forecast_vals <- forecast(model, h = 3)

print(forecast_vals)
plot(forecast_vals)

Seeing R generate future values based on historical data made me feel like I had truly become a data scientist.

Takeaways

This project was more than a homework assignment; it was a full immersion into data science with R. I learned how to:

Clean and structure real‑world data
Analyze business performance
Build meaningful visualizations
Create predictive models
Organize a complete analytical workflow

Most importantly, this one‑month journey gave me confidence and motivation to continue. And honestly? This is just the beginning.

How I Began My Data Science Journey with R in the Last Month

Introduction

Getting Started with R

Organizing the Project

Data Cleaning Challenges

Analysis Performed

Visualizations

Time‑Series Forecasting

Takeaways

Related posts

Unraveling XML: Visualize Hierarchical Data with XML Tree Visualization Tools

Supabase with PowerBI Dashboard

Personal Expense Dashboard: Top 10 Metrics and KPIs to Track

Advanced Imputation with R Packages