SQL: Doing GROUP BY in CsvPath

Published: (December 3, 2025 at 08:28 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

Let’s look at how to create a simple GROUP BY report in the CsvPath Framework’s tabular data validation language. Of all our examples, this is an easy one!

SQL GROUP BY Overview

A GROUP BY query selects rows and groups them according to one or more columns. The archetypal example is:

SELECT  
  dept, 
  role, 
  SUM(salary) total_salary
FROM    employee
GROUP BY dept, role

This query produces a result set with three columns: a unique combination of dept and role, and the summed salaries for each combination.

Setting Up the CsvPath Example

We’ll use FlightPath Data’s built‑in examples. In a typical project, FlightPath creates an examples folder with useful how‑to examples. For this demo we’ll use examples/counting/projects.csv as the data source and create a new CsvPath file.

Basic Scaffold

~ test-data:examples/counting/projects.csv ~
$[*][

]

The directive ~ test-data:... ~ tells FlightPath which file to scan (* means the whole file). In production this line is ignored.

Simple GROUP BY with One Column

The following CsvPath aggregates worker hours by agency:

~ test-data:examples/counting/projects.csv ~
$[*][
    subtotal.worker_hours(#agency, #13)
    last() -> var_table("worker_hours")
]
  • subtotal.worker_hours(#agency, #13) subtotals the values in header #13 (the #worker_hours_this_period column) for each distinct #agency value.
  • last() matches only the final line of the file. When it matches, the right‑hand side of -> is executed.
  • var_table("worker_hours") prints a text table of the worker_hours variable.

Screenshot of the Help Window

The FlightPath Data help window

Resulting Table

The validation report from our CSV file using CsvPath

Only two lines of CsvPath are needed for this simple aggregation.

Extending to Two Columns (GROUP BY on Multiple Fields)

To mimic the original SQL example that groups by both dept and role, we create a composite key:

~ test-data:examples/counting/projects.csv ~
$[*][
    @key = concat(#agency, ", ", #neighborhood)
    subtotal.Aggregate_hours(@key, #13)
    last() -> var_table("Aggregate_hours")
]
  • @key concatenates the #agency and #neighborhood columns, forming a unique identifier for each group.
  • subtotal.Aggregate_hours(@key, #13) aggregates the worker‑hour values for each distinct key.
  • The final var_table call prints the aggregated results.

Full Example Screenshot

The full CsvPath example for validating a CSV file

Resulting Table

The validation report output from CsvPath for a CSV file

The printed table is useful for human‑readable validation, while the underlying vars.json file contains the structured data for automated processing.

Automation and Reporting

CsvPath is designed for lights‑out automation. Validation results can be captured in:

  • vars.json – holds variables such as the aggregated tables.
  • errors.json – collects built‑in validation errors. Custom errors can be generated with the error() function.

CsvPath also provides a sophisticated print‑out subsystem that can direct output to multiple streams, allowing you to separate informational messages, error reports, and other run‑specific data for different audiences.


End of article.

Back to Blog

Related posts

Read more »

create12

T-SQL script for DBVisualizer to count rows added in last 45 days

create10

sql SELECT utc.table_name, xt.rows_last_45_days FROM SELECT table_name FROM user_tab_columns WHERE UPPERcolumn_name = 'INTERNTIMESTAMP' AND data_type LIKE 'TIME...

step2

!Query Filterhttps://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2...