How I Began My Data Science Journey with R in the Last Month

Published: (December 9, 2025 at 11:49 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

Over the past month I decided to dive seriously into data science with one clear mission: learn how to analyze real data using R like a professional.
To challenge myself I tackled a complete e‑commerce analytics project. It was demanding, sometimes frustrating, but incredibly rewarding. Below is what I learned, how I progressed, and why this one‑month experience became a turning point in my journey.

Getting Started with R

At first R looked unusual and a bit intimidating, but once I started using the right libraries everything became more natural:

  • dplyr for data manipulation
  • ggplot2 for visualization
  • readxl and read.csv for importing data
  • forecast for my first time‑series predictions

Writing pipelines with the pipe operator %>% even became enjoyable—it felt like guiding the computer step‑by‑step through a clear thought process.

Organizing the Project

A major lesson: good organization matters. I created separate scripts for each step of the analysis:

  • data_import_cleaning.R – data import & cleaning
  • sales_analysis.R – sales analysis
  • product_insights.R – product insights
  • customer_segmentation.R – customer segmentation
  • seller_performance.R – seller performance
  • logistics_delivery.R – logistics & delivery
  • service_quality.R – service quality
  • predictions.R – predictions
  • visualizations.R – visualizations

and a main controller script main.R.
This approach mirrors how professional data analysts build reproducible workflows.

Data Cleaning Challenges

The project involved a variety of messy issues:

  • Inconsistent date formats
  • Numeric values stored as text with commas
  • Inconsistent region names
  • Missing values
  • Merging multiple data sources

Fixing these problems gave me a deeper sense of how real datasets behave and how to make them usable.

Analysis Performed

Once the data was clean, I explored:

  • Monthly, quarterly, and yearly revenue
  • Top‑selling products
  • Customer segmentation (premium, standard, occasional)
  • Seller performance
  • Delivery delays
  • Service quality
  • Correlation between delivery delay and cancellations

Visualizations

I created a range of charts to reveal the story hidden in the data:

  • Line plots
  • Bar plots
  • Scatter plots
  • Heatmaps

Seasonal patterns emerged, certain categories dominated, and long delays clearly led to more cancellations. The numbers transformed into actionable insights.

Time‑Series Forecasting

Exploring forecasting with auto.arima() was one of the most rewarding parts. I transformed the monthly revenue into a time series and predicted the next quarter:

library(forecast)

# Convert monthly revenue to a ts object
revenue_ts <- ts(monthly_revenue, start = c(2023, 1), frequency = 12)

# Fit ARIMA model
model <- auto.arima(revenue_ts)

# Forecast next quarter (3 months)
forecast_vals <- forecast(model, h = 3)

print(forecast_vals)
plot(forecast_vals)

Seeing R generate future values based on historical data made me feel like I had truly become a data scientist.

Takeaways

This project was more than a homework assignment; it was a full immersion into data science with R. I learned how to:

  • Clean and structure real‑world data
  • Analyze business performance
  • Build meaningful visualizations
  • Create predictive models
  • Organize a complete analytical workflow

Most importantly, this one‑month journey gave me confidence and motivation to continue. And honestly? This is just the beginning.

Back to Blog

Related posts

Read more »