A Linux Tutorial: Log to CSV to JSON
Source: Dev.to
Overview
This tutorial walks through the process of converting raw application logs into structured JSON data. The workflow is useful for generating test data for SLM testing.
Step 1: Generate Raw Logs
The process begins by generating log data. This can be done manually using the echo command to simulate application output.
echo "2026-01-18 05:42:09 | INFO | system initialized" >> tutorial/myapp.log
echo "2026-01-18 05:42:09 | ERROR | disk space low" >> tutorial/myapp.log
echo "2026-01-18 05:42:09 | INFO | user logged in" >> tutorial/myapp.log
Step 2: Raw Log File
Once generated, the log file contains raw, delimited entries.
File: tutorial/myapp.log
2026-01-18 05:42:09 | INFO | system initialized
2026-01-18 05:42:09 | ERROR | disk space low
2026-01-18 05:42:09 | INFO | user logged in
Step 3: Convert to CSV
The raw logs are parsed into a CSV format. This can be done using shell utilities like sed or awk to replace delimiters.
Example command:
sed 's/ | /","/g; s/^/"/; s/$/"/' tutorial/myapp.log > tutorial/myapp.csv
File: tutorial/myapp.csv
"2026-01-18 05:42:09","INFO","system initialized"
"2026-01-18 05:42:09","ERROR","disk space low"
"2026-01-18 05:42:09","INFO","user logged in"
Step 4: Python Conversion Script
A Python script uses the csv and json modules to transform the flat CSV file into a structured JSON array.
File: tutorial/csv_to_json.py
import csv
import json
def convert(csv_file, json_file):
data = []
with open(csv_file, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f, fieldnames=["timestamp", "level", "message"])
for row in reader:
data.append(row)
with open(json_file, mode='w', encoding='utf-8') as f:
json.dump(data, f, indent=4)
if __name__ == "__main__":
convert('tutorial/myapp.csv', 'tutorial/myapp.json')
Step 5: Final JSON Output
The resulting JSON file is ready for consumption by web dashboards or other data analysis tools.
File: tutorial/myapp.json
[
{
"timestamp": "2026-01-18 05:42:09",
"level": "INFO",
"message": "system initialized"
},
{
"timestamp": "2026-01-18 05:42:09",
"level": "ERROR",
"message": "disk space low"
},
{
"timestamp": "2026-01-18 05:42:09",
"level": "INFO",
"message": "user logged in"
}
]