Lightweight big data processing technology
Source: Dev.to
Drawbacks of Traditional Big Data Architecture
- Complex O&M – Larger clusters require more operational effort. Current big‑data technologies often cannot fully utilize hardware resources, leading to oversized clusters and higher maintenance costs.
- Closed system – Data must be loaded into a database before it can be processed. This forces an ETL step, which adds latency and reduces real‑time capabilities, especially when data originates from many sources.
- Tight coupling – Sharing tables and computation logic across multiple applications creates strong dependencies. Changes in one application can affect others, making expansion and maintenance difficult and increasing pressure on capacity.
Front‑End Computation Layer
When the central data center is under heavy load, part of the computation can be shifted to the application side. A front‑end computation layer can consist of multiple data marts, each dedicated to a specific type of application. This approach:
- Shares the computational load with the data center.
- Reduces coupling between applications because each data mart serves a single purpose.
However, the layer must be built with technology that is lightweight, easy to operate, and capable of handling relatively small data volumes.
Why Traditional Databases Are Not Ideal
- Heavy deployment – Most databases require separate physical resources, adding complexity and cost to the overall framework.
- Data range dilemma
- Too small: The data mart cannot satisfy application queries.
- Too large: It becomes a de‑facto data center, defeating the purpose of a lightweight layer.
- SQL limitations
- Requires loading all data into the database, leading to inefficiency.
- Lacks support for complex calculations (e.g., multi‑step e‑commerce funnels) without resorting to external languages like Python or Java.
- Nested, multi‑thousand‑line SQL scripts are hard to write, read, and maintain.
These issues affect both traditional databases and many big‑data platforms that expose SQL interfaces.
Requirements for a Lightweight Processing Engine
- Independence from databases – No need for a full RDBMS deployment.
- Integrable & embeddable – Can be packaged within applications.
- Simple and convenient – Minimal operational overhead.
- Open and extensible – Ability to process data from multiple sources.
- Efficient handling of data range – Supports selective data loading without becoming a full data center.
esProc SPL: An Open‑Source Solution
esProc is a structured‑data computing engine designed for big data with the following characteristics:
- Lightweight deployment – Can run independently or be embedded directly in applications, reducing overall system complexity.
- Strong openness – Supports mixed‑source computation, allowing data from various origins to be processed together.
- High‑performance single‑node processing – Maximizes hardware utilization on a single node, achieving cluster‑like performance without the need for multiple machines.
- Data routing – Enables tasks to be executed locally or delegated to a data center as needed.
- Agile SPL syntax – The Structured Process Language (SPL) is concise and well‑suited for complex calculations, avoiding the verbosity of nested SQL.
In short, esProc offers a lightweight solution from deployment through execution.
Technical Framework of esProc
Integration
- Embedded mode – Integrate esProc into an application to perform calculations alongside business logic, minimizing framework changes and O&M effort.
- Independent service mode – When additional compute power is required, deploy esProc as a standalone service. It supports distributed deployment with load balancing and fault tolerance, providing cluster‑like capabilities without the overhead of traditional big‑data clusters.