SQL: Doing GROUP BY in CsvPath

Published: 2 months ago (December 3, 2025 at 08:28 PM EST)

3 min read

Source: Dev.to

Introduction

Let’s look at how to create a simple GROUP BY report in the CsvPath Framework’s tabular data validation language. Of all our examples, this is an easy one!

SQL `GROUP BY` Overview

A GROUP BY query selects rows and groups them according to one or more columns. The archetypal example is:

SELECT  
  dept, 
  role, 
  SUM(salary) total_salary
FROM    employee
GROUP BY dept, role

This query produces a result set with three columns: a unique combination of dept and role, and the summed salaries for each combination.

Setting Up the CsvPath Example

We’ll use FlightPath Data’s built‑in examples. In a typical project, FlightPath creates an examples folder with useful how‑to examples. For this demo we’ll use examples/counting/projects.csv as the data source and create a new CsvPath file.

Basic Scaffold

~ test-data:examples/counting/projects.csv ~
$[*][

]

The directive ~ test-data:... ~ tells FlightPath which file to scan (* means the whole file). In production this line is ignored.

Simple `GROUP BY` with One Column

The following CsvPath aggregates worker hours by agency:

~ test-data:examples/counting/projects.csv ~
$[*][
    subtotal.worker_hours(#agency, #13)
    last() -> var_table("worker_hours")
]

subtotal.worker_hours(#agency, #13) subtotals the values in header #13 (the #worker_hours_this_period column) for each distinct #agency value.
last() matches only the final line of the file. When it matches, the right‑hand side of -> is executed.
var_table("worker_hours") prints a text table of the worker_hours variable.

Screenshot of the Help Window

Resulting Table

Only two lines of CsvPath are needed for this simple aggregation.

Extending to Two Columns (`GROUP BY` on Multiple Fields)

To mimic the original SQL example that groups by both dept and role, we create a composite key:

~ test-data:examples/counting/projects.csv ~
$[*][
    @key = concat(#agency, ", ", #neighborhood)
    subtotal.Aggregate_hours(@key, #13)
    last() -> var_table("Aggregate_hours")
]

@key concatenates the #agency and #neighborhood columns, forming a unique identifier for each group.
subtotal.Aggregate_hours(@key, #13) aggregates the worker‑hour values for each distinct key.
The final var_table call prints the aggregated results.

Full Example Screenshot

Resulting Table

The printed table is useful for human‑readable validation, while the underlying vars.json file contains the structured data for automated processing.

Automation and Reporting

CsvPath is designed for lights‑out automation. Validation results can be captured in:

vars.json – holds variables such as the aggregated tables.
errors.json – collects built‑in validation errors. Custom errors can be generated with the error() function.

CsvPath also provides a sophisticated print‑out subsystem that can direct output to multiple streams, allowing you to separate informational messages, error reports, and other run‑specific data for different audiences.

End of article.

SQL: Doing GROUP BY in CsvPath

Introduction

SQL `GROUP BY` Overview

Setting Up the CsvPath Example

Basic Scaffold

Simple `GROUP BY` with One Column

Screenshot of the Help Window

Resulting Table

Extending to Two Columns (`GROUP BY` on Multiple Fields)

Full Example Screenshot

Resulting Table

Automation and Reporting

Related posts

Designing a Type-Safe Plugin System for Next.js: Why Zod is Our Secret Weapon

PostgreSQL Log Viewing

PostgreSQL Merge Into

Oracle Merge Into

Introduction

SQL GROUP BY Overview

Setting Up the CsvPath Example

Basic Scaffold

Simple GROUP BY with One Column

Screenshot of the Help Window

Resulting Table

Extending to Two Columns (GROUP BY on Multiple Fields)

Full Example Screenshot

Resulting Table

Automation and Reporting

Related posts

Designing a Type-Safe Plugin System for Next.js: Why Zod is Our Secret Weapon

PostgreSQL Log Viewing

PostgreSQL Merge Into

Oracle Merge Into

SQL `GROUP BY` Overview

Simple `GROUP BY` with One Column

Extending to Two Columns (`GROUP BY` on Multiple Fields)