Sizing & Capacity Planning
DataStreamer Sizing & Capacity Planning
Enterprise Capability, Startup Agility.
DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.
The DataStreamer Performance Philosophy
DataStreamer is built in Rust, a modern systems programming language that guarantees memory safety and exceptional performance. Unlike traditional data pipelines that rely on heavy runtimes like the JVM (Java Virtual Machine) or interpreted languages, DataStreamer compiles to a native binary that runs directly on the CPU. This results in:
Lower CPU Usage: More processing power dedicated to your data, not the runtime.
Minimal Memory Footprint: Significantly reduced RAM requirements compared to alternatives.
Predictable Performance: Consistent throughput without the garbage collection pauses common in other systems.
Our core design principle is throughput-based sizing. Instead of counting components or pipelines, we focus on the volume of data you need to process. This provides a more accurate and scalable model for capacity planning.
Sizing Methodology
Sizing a DataStreamer deployment involves two primary considerations: CPU and Memory.
CPU Sizing
DataStreamer is almost always CPU-constrained, meaning CPU is the most critical resource to plan for. Our sizing model is based on the MiB/s (megabytes per second) of data throughput you expect to process.
Throughput per vCPU
The following table provides baseline throughput estimates for a single virtual CPU (vCPU). Use this as a starting point for your calculations.
Structured Logs (JSON, etc.)
~768 bytes
~25 MiB/s
Unstructured Logs (plain text)
~256 bytes
~10 MiB/s
Metrics (Prometheus, etc.)
~256 bytes
~25 MiB/s
Trace Spans (OpenTelemetry)
~1 KB
~25 MiB/s
CPU Calculation Formula
Add Transform Overhead
Transformations add CPU load. Apply these multipliers to your Base vCPU calculation.
Light Parsing (JSON, Logfmt)
Base vCPU * 10% * Number of Parsers
Heavy Parsing (Grok, Regex)
Base vCPU * 50% * Number of Parsers
Enrichment (GeoIP, etc.)
Base vCPU * 20% * Number of Enrichments
No parser, no problem! DataStreamer’s AI-powered parser can build complex parsers for you in seconds, not weeks. Take Charge at no extra charge.
Memory Sizing
Memory requirements are influenced by event rate, transformations, and buffering.
Memory Calculation Formula
Simple Example Sizing Scenario
Let’s calculate the requirements for a workload of 50,000 EPS with an average event size of 2 KB.
The pipeline includes:
2 Heavy Regex Parsers
1 Enrichment transform
CPU Calculation:
Throughput:
50,000 EPS * 2 KB / 1024 = 97.66 MiB/sBase vCPU:
97.66 MiB/s / 25 MiB/s = 3.91 vCPUHeavy Parser Overhead:
3.91 * 50% * 2 = 3.91 vCPUEnrichment Overhead:
3.91 * 20% * 1 = 0.78 vCPUSubtotal vCPU:
3.91 + 3.91 + 0.78 = 8.6 vCPUTotal vCPU:
8.6 * 1.3 = 11.18 vCPU(Recommended: 12 vCPU)
Memory Calculation:
Base Memory:
50,000 EPS / 25,000 = 2.0 GBHeavy Parser Memory:
0.5 GB * 2 = 1.0 GBEnrichment Memory:
0.3 GB * 1 = 0.3 GBBuffer Memory:
97.66 MiB/s * 0.1 = 9.77 GB(approx)Subtotal Memory:
2.0 + 1.0 + 0.3 + 9.77 = 13.07 GBTotal Memory:
13.07 * 1.4 = 18.3 GB(Recommended: 20 GB RAM)
For this workload, you should provision a system or set of containers with a total of 12 vCPU and 20 GB of RAM to turn chaos into actionable insights effectively.
Last updated