šŸ”„ Day 5: Introduction to DataFrames - The Most Importantce of Spark API

Published: (December 5, 2025 at 11:00 AM EST)
1 min read
Source: Dev.to

Source: Dev.to

What is a DataFrame?

A DataFrame in Spark is a distributed, column‑based, optimized table‑like structure used for efficient data processing.

  • Feels like SQL
  • Works like Pandas
  • Scales to terabytes effortlessly

Why DataFrames are better than RDDs

  • Use the Catalyst optimizer → rewrites queries for speed
  • Use the Tungsten execution engine → memory‑efficient
  • Support automatic code generation
  • Allow SQL‑like expressions
  • Support file formats such as Parquet, ORC, JSON, Avro

This is why almost every industry Spark job uses DataFrames.

Creating Your First DataFrame

df = spark.createDataFrame([(1, "A"), (2, "B")], ["id", "name"])
df.show()

From CSV

df = spark.read.csv("sales.csv", header=True, inferSchema=True)

From JSON

df = spark.read.json("users.json")

From Parquet (fastest!)

df = spark.read.parquet("events.parquet")

Understanding Schema

Every DataFrame has a schema (column name + data type).

df.printSchema()

Example output

root
 |-- id: integer (nullable = true)
 |-- name: string (nullable = true)

Schema is critical because Spark is strongly typed at runtime.

DataFrame Operations You’ll Use Daily

Select columns

df.select("name", "id").show()

Filter rows

df.filter(col("id") > 5).show()

Add new columns

df = df.withColumn("new_value", col("id") * 100)

Drop columns

df = df.drop("unwanted_column")

Rename columns

df = df.withColumnRenamed("id", "user_id")

DataFrame Actions — These Trigger Execution

df.count()
df.show()
df.collect()
df.take(5)

Tips for Best Practice

  • Follow for more such content.
  • Let me know if I missed anything in the comments.
  • Thank you!
Back to Blog

Related posts

Read more Ā»

WTF is Distributed Data Warehousing?

What is Distributed Data Warehousing? A data warehouse is a centralized repository where an organization stores, organizes, and makes data readily available fo...