Data warehouse without using SQL

Published: 2 months ago (December 8, 2025 at 02:27 AM EST)

3 min read

Source: Dev.to

Currently the vast majority of data warehouses employ SQL to process data. After decades of development, SQL has become the standard language in the database world and amassed a large user population, so supporting SQL for data warehouses is normal. However, in the context of contemporary big data, business complexity keeps increasing, and the abilities of SQL seem increasingly inadequate in data‑warehouse scenarios where computation is the primary task. A typical manifestation is that some data warehouses begin to integrate non‑SQL languages like Python. The industry’s doubt about SQL’s capabilities is evident.

In this article we present a non‑SQL‑based data warehouse esProc. Since esProc does not use SQL as its query language (it uses SPL), we can regard it as a new type of data warehouse.

Why doesn’t esProc use SQL?

Lack of procedural and ordered computation

SQL provides poor support for procedural computation. Even with CTE syntax, describing complex calculations in SQL is cumbersome and often requires deeply nested queries. Moreover, SQL treats datasets as unordered, making ordered calculations difficult or impossible.

Typical data‑analysis tasks—such as funnel analysis for an e‑commerce company (calculating user churn at each step: page view → add‑to‑cart → order → payment)—are much easier to implement in a language that supports stepwise, ordered calculations (e.g., Python or SPL).

Closedness of relational databases

Relational databases are designed primarily for transaction processing (TP). They enforce strict constraints: only data that meet criteria can be loaded, and only data inside the database can be processed. This “closedness” is valuable for TP but is a disadvantage for analytical processing (AP):

Cross‑source calculations – Data from multiple databases cannot be combined freely, limiting warehouse use cases.
ETL overhead – Modern applications ingest diverse data sources (files, streams, NoSQL stores). Because the database cannot compute data outside its storage, data must be imported first, adding ETL steps, increasing programmer workload, and losing real‑time freshness.
Space‑for‑time trade‑offs – Storing redundant intermediate tables can improve query performance, but SQL forces data into tables. Maintaining thousands of intermediate tables inflates metadata, raises O&M costs, and creates capacity and scaling challenges.

Performance limitations

SQL performance depends heavily on the database’s optimizer. For simple queries, the optimizer can choose efficient execution plans, but once calculations become moderately complex, the optimizer often falls back to literal execution, causing sharp performance drops. Funnel‑analysis examples frequently run too slowly to be practical, and batch jobs can take hours to complete.

Problems with Adding Python to SQL‑based Warehouses

Introducing a third‑party language like Python compounds the technology stack:

Higher development and O&M costs – Managing multiple languages with different paradigms increases complexity.
Limited big‑data support – Python lacks native mechanisms (e.g., cursors) for processing data that exceed memory capacity, making large‑scale computation cumbersome.
Ineffective parallelism – Python’s Global Interpreter Lock (GIL) prevents true multi‑threaded CPU parallelism; parallel execution is often serial or slower than serial processing.
I/O overhead – Python must read data from the closed database, incurring additional I/O costs and further degrading performance.
Inability to intervene storage – High‑performance algorithms (e.g., ordered merge) require direct control over data layout, which is impossible when data reside in a closed relational store.

Conclusion

SQL‑based data warehouses suffer from three core issues:

Insufficient procedural and ordered computation capabilities
Closedness that hinders flexible data integration and increases ETL/ELT burdens
Performance degradation for moderately complex analytical workloads

Adding Python does not resolve these problems; it introduces additional complexity and performance constraints. esProc addresses these challenges by abandoning SQL in favor of SPL, offering a more flexible, open, and performant environment for modern data‑warehouse analytics.

Data warehouse without using SQL

Why doesn’t esProc use SQL?

Lack of procedural and ordered computation

Closedness of relational databases

Performance limitations

Problems with Adding Python to SQL‑based Warehouses

Conclusion

Related posts

WTF is Distributed Data Warehousing?

🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

🔥 Day 3: RDDs - The Foundation of Spark

Show HN: I built a system for active note-taking in regular meetings like 1-1s