A Minimal Go Toolkit for Cleaning, Validating, and Querying CSV/TSV/Excel/Parquet Files

Published: (December 10, 2025 at 03:20 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

In many software systems, not all data lives inside a database.
Sometimes it’s stored in structured files such as CSV, TSV, or spreadsheets, and in practice these files are often not clean. Data may be entered manually, values can be missing, and rows may contain inconsistencies that break downstream processing. When the file is large, trying to “fix it in Excel” can lead to crashes or make it difficult to identify invalid records.

I repeatedly faced these issues in real projects, so I built three small Go libraries that focus on preprocessing, validation, and lightweight analysis of structured file data. All three libraries work with the standard io.Reader interface, allowing you to use familiar Go primitives without custom data structures.

  • fileprep – preprocessing + field‑level validation using struct tags
  • fileframe – a tiny, immutable DataFrame for filtering and inspection
  • filesql – run SQL directly on CSV/TSV/LTSV/Excel/Parquet via an embedded SQLite database

These libraries are independent, but share the same io.Reader‑based API.

fileprep

Features

  • Preprocessing: trim, replace, Unicode normalization, type coercion, etc.
  • Clear error reporting: identifies which row and which column caused the failure
  • Supports composite validation (cross‑column rules)
  • Works with CSV, TSV, LTSV, Parquet, Excel — anything fed through io.Reader

Example

type User struct {
    Name  string `prep:"trim" validate:"required"`
    Email string `prep:"trim,lowercase"`
    Age   string
}

func main() {
    csvData := `name,email,age
  John Doe  ,JOHN@EXAMPLE.COM,30
Jane Smith,jane@example.com,25
`

    processor := fileprep.NewProcessor(fileprep.FileTypeCSV)
    var users []User

    reader, result, err := processor.Process(strings.NewReader(csvData), &users)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }

    fmt.Printf("Processed %d rows, %d valid\n", result.RowCount, result.ValidRowCount)

    for _, user := range users {
        fmt.Printf("Name: %q, Email: %q\n", user.Name, user.Email)
    }

    // `reader` can be passed directly to filesql
    _ = reader
}

Output

Processed 2 rows, 2 valid
Name: "John Doe", Email: "john@example.com"
Name: "Jane Smith", Email: "jane@example.com"

fileframe

Features

  • Immutable DataFrame
  • Filtering, mapping, grouping
  • Ideal for “one‑time transformations” on small/medium CSV/TSV datasets

Example

// Sample sales data
csvData := `product,amount,category
Apple,100,Fruit
Banana,150,Fruit
Carrot,80,Vegetable
Orange,120,Fruit
Broccoli,90,Vegetable`

df, err := fileframe.NewDataFrame(strings.NewReader(csvData), fileframe.CSV)
if err != nil {
    fmt.Println("Error:", err)
    return
}

fmt.Printf("Total rows: %d\n", df.Len())
fmt.Printf("Columns: %v\n", df.Columns())

// Filter
filtered := df.Filter(func(row map[string]any) bool {
    amount, ok := row["amount"].(int64)
    return ok && amount > 100
})
fmt.Printf("Rows with amount > 100: %d\n", filtered.Len())

// GroupBy + Sum
groupedDf, err := df.GroupBy("category")
if err != nil {
    fmt.Println("Error:", err)
    return
}
grouped, err := groupedDf.Sum("amount")
if err != nil {
    fmt.Println("Error:", err)
    return
}

for _, row := range grouped.ToRecords() {
    fmt.Printf("  %s: %.0f\n", row["category"], row["sum_amount"])
}

Output

Total rows: 5
Columns: [product amount category]
Rows with amount > 100: 2
Fruit: 370
Vegetable: 170

filesql

filesql is not “SQL on files” in the literal sense. Internally it loads your data into a temporary SQLite database, giving you full SQL capabilities without the need to manage a database yourself.

Example

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    db, err := filesql.OpenContext(ctx, "data.csv")
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()

    rows, err := db.QueryContext(ctx, "SELECT * FROM data WHERE age > 25")
    if err != nil {
        log.Fatal(err)
    }
    defer rows.Close()

    for rows.Next() {
        var name string
        var age int
        if err := rows.Scan(&name, &age); err != nil {
            log.Fatal(err)
        }
        fmt.Printf("Name: %s, Age: %d\n", name, age)
    }
}

If you’re a Go developer who doesn’t need the full Apache or Python ETL ecosystems, these three lightweight libraries may fit your workflow well. The user base is still small, so there may be undiscovered bugs—feedback and issue reports are very welcome.

Back to Blog

Related posts

Read more »