gRPC Transmission Optimization: An Efficient Solution Based on Flattening and Bitset
Source: Dev.to
Background and Technical Challenges
When designing APIs for database middleware, we often need to return multi‑row query results. Using traditional gRPC definitions can lead to the following issues:
Payload Redundancy (Key Repetition)
A naïve Protobuf definition maps each row to a map or object:
message Row {
map data = 1;
}
message Response {
repeated Row rows = 1;
}
Problem:
For a query returning 10 000 records with fields customer_id, created_at, and status, the field names (keys) are transmitted 10 000 times, causing severe payload bloating and unnecessary bandwidth consumption.
Protobuf NULL Value Limitations
Proto3 treats scalar types as non‑nullable:
- A
stringfield that isNULLis serialized as an empty string"". - The client cannot distinguish between an “empty value” and a genuine
NULLfrom the database.
Wrapper types (e.g., google.protobuf.StringValue) can solve this, but they introduce extra nesting and processing overhead.
Solution: Database‑Like Low‑Level Architecture
The AI suggested moving away from the traditional “object per row” mindset and borrowing ideas from columnar storage and ODBC/JDBC driver implementations. The core optimization consists of two parts:
Flattening
Eliminate key repetition by transmitting:
- Header (Metadata): Field definitions (
columns) once. - Body (Values): All data values in a single one‑dimensional array (
values).
This removes per‑row key transmission, dramatically shrinking the payload.
Bitset (Bitmap) Mechanism
Handle NULL values without extra wrapper messages:
- Introduce a binary field (
bytes) that records the null‑state of each value. - Bitset rules:
1→ value isNULL0→ value is valid
Space efficiency: Every 8 values require only 1 byte. For 1 000 rows × 8 columns, the overhead is roughly 1 KB.
Implementation Essentials
Below are the key code snippets for compression on the server side and restoration on the client side.
Protobuf Definition (proto/query.proto)
message QueryResponse {
repeated string values = 3; // Flattened values
repeated Column columns = 4; // Field definitions
bytes null_bitmap = 7; // NULL marker bitstream
int32 row_count = 6;
}
Server Side – Encoding and Compression
The server iterates through the result set once, building both the flattened value list and the bitmap.
// $result['rows'] is the 2‑D array returned by the database
$values = [];
$packedBytes = "";
$currentByte = 0;
$bitIndex = 0;
foreach ($result['rows'] as $row) {
foreach ($row as $value) {
if ($value === null) {
$values[] = ""; // Placeholder for empty string
$currentByte |= (1 0) {
$packedBytes .= chr($currentByte);
}
Client Side – Decoding and Restoration
The client reconstructs rows by slicing the flattened array according to the column count and consulting the bitmap to restore NULLs.
$fetchedRows = [];
$columns = $response->getColumns();
$colCount = count($columns);
$values = $response->getValues(); // Flattened array
$bitmap = $response->getNullBitmap(); // Bitmap string
$rowCount = $response->getRowCount();
for ($r = 0; $r > $bitPos) & 1;
if ($isNull) {
$row[] = null; // Restore NULL
} else {
$row[] = $values[$flatIndex]; // Restore actual value
}
}
$fetchedRows[] = $row;
}
} else {
$row[] = $values[$flatIndex];
}
}
$fetchedRows[] = $row;
Takeaways
- Flattening removes repetitive field names, cutting payload size dramatically.
- Bitset provides a compact, constant‑time way to represent
NULLstates (1 bit per value). - The combined approach yields a lean, high‑throughput gRPC payload suitable for large result sets without sacrificing the ability to distinguish
NULLfrom empty strings.
Optimization Benefits Analysis
Adopting this architecture yields the following specific benefits:
Extreme Transmission Efficiency
- Through the flattening design, payload size grows linearly with data volume, unaffected by field‑name length. In large‑data query scenarios, bandwidth savings are extremely significant.
Precise Type Restoration
- The client can accurately restore database
NULLstates by reading thenull_bitmap, solving the limitations of gRPC default types.
Parsing Performance Improvement
- For PHP and other languages, processing flat one‑dimensional arrays (indexed arrays) generally offers better CPU‑cache hit rates and lower memory fragmentation compared to processing massive, complex nested objects.
Conclusion
This optimization case demonstrates the importance of moderately introducing low‑level system design thinking in modern distributed systems. Through collaboration with AI, we moved beyond the framework of simple API design, utilizing bitwise operations and data‑structure optimization to solve inherent limitations of gRPC/Protobuf in database‑application scenarios at a very low cost.