How I Began My Data Science Journey with R in the Last Month
Source: Dev.to
Introduction
Over the past month I decided to dive seriously into data science with one clear mission: learn how to analyze real data using R like a professional.
To challenge myself I tackled a complete e‑commerce analytics project. It was demanding, sometimes frustrating, but incredibly rewarding. Below is what I learned, how I progressed, and why this one‑month experience became a turning point in my journey.
Getting Started with R
At first R looked unusual and a bit intimidating, but once I started using the right libraries everything became more natural:
dplyrfor data manipulationggplot2for visualizationreadxlandread.csvfor importing dataforecastfor my first time‑series predictions
Writing pipelines with the pipe operator %>% even became enjoyable—it felt like guiding the computer step‑by‑step through a clear thought process.
Organizing the Project
A major lesson: good organization matters. I created separate scripts for each step of the analysis:
data_import_cleaning.R– data import & cleaningsales_analysis.R– sales analysisproduct_insights.R– product insightscustomer_segmentation.R– customer segmentationseller_performance.R– seller performancelogistics_delivery.R– logistics & deliveryservice_quality.R– service qualitypredictions.R– predictionsvisualizations.R– visualizations
and a main controller script main.R.
This approach mirrors how professional data analysts build reproducible workflows.
Data Cleaning Challenges
The project involved a variety of messy issues:
- Inconsistent date formats
- Numeric values stored as text with commas
- Inconsistent region names
- Missing values
- Merging multiple data sources
Fixing these problems gave me a deeper sense of how real datasets behave and how to make them usable.
Analysis Performed
Once the data was clean, I explored:
- Monthly, quarterly, and yearly revenue
- Top‑selling products
- Customer segmentation (premium, standard, occasional)
- Seller performance
- Delivery delays
- Service quality
- Correlation between delivery delay and cancellations
Visualizations
I created a range of charts to reveal the story hidden in the data:
- Line plots
- Bar plots
- Scatter plots
- Heatmaps
Seasonal patterns emerged, certain categories dominated, and long delays clearly led to more cancellations. The numbers transformed into actionable insights.
Time‑Series Forecasting
Exploring forecasting with auto.arima() was one of the most rewarding parts. I transformed the monthly revenue into a time series and predicted the next quarter:
library(forecast)
# Convert monthly revenue to a ts object
revenue_ts <- ts(monthly_revenue, start = c(2023, 1), frequency = 12)
# Fit ARIMA model
model <- auto.arima(revenue_ts)
# Forecast next quarter (3 months)
forecast_vals <- forecast(model, h = 3)
print(forecast_vals)
plot(forecast_vals)
Seeing R generate future values based on historical data made me feel like I had truly become a data scientist.
Takeaways
This project was more than a homework assignment; it was a full immersion into data science with R. I learned how to:
- Clean and structure real‑world data
- Analyze business performance
- Build meaningful visualizations
- Create predictive models
- Organize a complete analytical workflow
Most importantly, this one‑month journey gave me confidence and motivation to continue. And honestly? This is just the beginning.