SQL: Doing GROUP BY in CsvPath
Source: Dev.to
Introduction
Let’s look at how to create a simple GROUP BY report in the CsvPath Framework’s tabular data validation language. Of all our examples, this is an easy one!
SQL GROUP BY Overview
A GROUP BY query selects rows and groups them according to one or more columns. The archetypal example is:
SELECT
dept,
role,
SUM(salary) total_salary
FROM employee
GROUP BY dept, role
This query produces a result set with three columns: a unique combination of dept and role, and the summed salaries for each combination.
Setting Up the CsvPath Example
We’ll use FlightPath Data’s built‑in examples. In a typical project, FlightPath creates an examples folder with useful how‑to examples. For this demo we’ll use examples/counting/projects.csv as the data source and create a new CsvPath file.
Basic Scaffold
~ test-data:examples/counting/projects.csv ~
$[*][
]
The directive ~ test-data:... ~ tells FlightPath which file to scan (* means the whole file). In production this line is ignored.
Simple GROUP BY with One Column
The following CsvPath aggregates worker hours by agency:
~ test-data:examples/counting/projects.csv ~
$[*][
subtotal.worker_hours(#agency, #13)
last() -> var_table("worker_hours")
]
subtotal.worker_hours(#agency, #13)subtotals the values in header#13(the#worker_hours_this_periodcolumn) for each distinct#agencyvalue.last()matches only the final line of the file. When it matches, the right‑hand side of->is executed.var_table("worker_hours")prints a text table of theworker_hoursvariable.
Screenshot of the Help Window
Resulting Table
Only two lines of CsvPath are needed for this simple aggregation.
Extending to Two Columns (GROUP BY on Multiple Fields)
To mimic the original SQL example that groups by both dept and role, we create a composite key:
~ test-data:examples/counting/projects.csv ~
$[*][
@key = concat(#agency, ", ", #neighborhood)
subtotal.Aggregate_hours(@key, #13)
last() -> var_table("Aggregate_hours")
]
@keyconcatenates the#agencyand#neighborhoodcolumns, forming a unique identifier for each group.subtotal.Aggregate_hours(@key, #13)aggregates the worker‑hour values for each distinct key.- The final
var_tablecall prints the aggregated results.
Full Example Screenshot
Resulting Table
The printed table is useful for human‑readable validation, while the underlying vars.json file contains the structured data for automated processing.
Automation and Reporting
CsvPath is designed for lights‑out automation. Validation results can be captured in:
vars.json– holds variables such as the aggregated tables.errors.json– collects built‑in validation errors. Custom errors can be generated with theerror()function.
CsvPath also provides a sophisticated print‑out subsystem that can direct output to multiple streams, allowing you to separate informational messages, error reports, and other run‑specific data for different audiences.
End of article.



