Loading TXT, CSV, and Other Delimited Files
Source: Dev.to
What This Article Covers
We will walk through importing:
- TXT, CSV, and other delimited files
- JSON files
- XML and HTML tables
- Excel workbooks
- SAS, SPSS, Stata datasets
- MATLAB and Octave files
- Relational databases via ODBC
We’ll also share a quick‑import hack that’s extremely useful for fast, ad‑hoc analysis.
Preparing Your Workspace Before Importing Data
A well‑prepared environment saves time and avoids unnecessary errors later.
Understanding and Setting the Working Directory
Most projects store all relevant data files in a single folder. You can tell R to treat this folder as its working directory, allowing you to import files using relative paths.
# Check your current working directory
getwd()
If your files are located elsewhere, change the working directory:
setwd("path/to/your/project/folder")
Once set, R will automatically look for files in this folder unless told otherwise.
Cleaning the Workspace
Objects from previous sessions can silently interfere with your analysis. It’s often best to start fresh:
rm(list = ls())
Tip: Disable saving the workspace when exiting R (e.g.,
R --no-save) to ensure each session starts clean.
Loading TXT, CSV, and Other Delimited Files
Delimited text files are among the most common data formats. These files store values separated by a delimiter such as a tab, comma, or semicolon.
Reading Text Files
Example of a tab‑delimited file
Category V1 V2
A 3 2
B 5 6
B 2 3
A 4 8
A 7 3
# Generic read.table() call
df
Caution: This method is handy for quick checks but not recommended for reproducible pipelines.
Importing Data Using Packages
Many complex formats require external packages. Install and load a package before using it:
install.packages("packageName")
library(packageName)
Reading JSON Files
# Install & load the rjson package
install.packages("rjson")
library(rjson)
# From a local file
json_data <- fromJSON(file = "input.json")
# Directly from a URL
json_data <- fromJSON(file = "https://example.com/data.json")
json_data is imported as a list. Convert to a data frame if needed:
json_df <- as.data.frame(json_data)
Importing XML Data and HTML Tables
Reading XML Files
install.packages(c("XML", "RCurl"))
library(XML)
library(RCurl)
# From a URL
xml_raw <- getURL("https://example.com/data.xml")
xml_data <- xmlTreeParse(xml_raw, useInternalNodes = TRUE)
# Convert to a data frame (if the XML structure is tabular)
xml_df <- xmlToDataFrame(xmlRoot(xml_data))
Extracting Tables from HTML Pages
install.packages("XML")
library(XML)
# Read all tables from a web page
html_tables <- readHTMLTable(getURL("https://example.com/page.html"))
# Example: select the first table
df_html <- html_tables[[1]]
Reading Excel Workbooks
readxl is lightweight and does not depend on Java or Perl.
install.packages("readxl")
library(readxl)
# First sheet (default)
df_excel <- read_excel("file.xlsx")
# Specific sheet by name
df_sheet3 <- read_excel("file.xlsx", sheet = "Sheet3")
# Specific sheet by index
df_sheet3_idx <- read_excel("file.xlsx", sheet = 3)
Importing Data from Statistical Software
SAS, SPSS, and Stata Files
The haven package preserves variable labels and factor information.
install.packages("haven")
library(haven)
sas_data <- read_sas("data.sas7bdat")
spss_data <- read_sav("data.sav")
stata_data <- read_dta("data.dta")
MATLAB and Octave Files
# MATLAB .mat files
install.packages("R.matlab")
library(R.matlab)
mat_data <- readMat("file.mat")
# Octave text files (via the foreign package)
install.packages("foreign")
library(foreign)
octave_data <- read.octave("file.txt")
Importing Data from Relational Databases (ODBC)
install.packages("RODBC")
library(RODBC)
# Establish a connection (replace with your DSN, user, and password)
con <- odbcConnect("my_dsn", uid = "my_user", pwd = "my_password")
# Fetch an entire table
df_table1 <- sqlFetch(con, "Table1")
# Run a custom query
df_query <- sqlQuery(con, "SELECT * FROM Table2 WHERE year = 2023")
# Close the connection when done
odbcClose(con)
Tips for Easier Data Import in R
- Ensure column names are unique – duplicate names cause ambiguous references.
- Avoid spaces and special characters in variable names; use underscores (
_) instead. - Replace blank values with
NA(e.g.,na.strings = c("", "NA")inread.table). - Follow consistent naming conventions (snake_case, camelCase, etc.) across projects.
- Remove comments or metadata lines from raw data files before importing, or use the
comment.charargument.
Happy importing! 🎉
Prefer Short, Meaningful Variable Names
Keep your code style consistent.
End Notes
Loading data into R is only the first step in a much larger analytics journey. Once your data is in R, the real work begins—cleaning, transforming, visualizing, and modeling.
In this article, we covered how to import data from:
- Flat files (TXT, CSV)
- Web formats (JSON, XML, HTML)
- Excel workbooks
- Statistical software
- Databases via ODBC
As with most things in R, there are multiple ways to accomplish the same task. This guide focuses on the most commonly used and reliable approaches.
If you know of better or faster alternatives, feel free to share them—learning R is always a collaborative process.
Happy importing… and make it easy’R.
About Perceptive Analytics
Our mission is “to enable businesses to unlock value in data.” For over 20 years, we’ve partnered with more than 100 clients—from Fortune 500 companies to mid‑sized firms—to solve complex data‑analytics challenges.
Our services include access to experienced Power BI freelancers and collaboration with a trusted Snowflake consultant, turning data into strategic insight.
We would love to talk to you. Do reach out to us.