Sizing & Capacity Planning

DataStreamer Sizing & Capacity Planning

Enterprise Capability, Startup Agility.

DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.

The DataStreamer Performance Philosophy

DataStreamer is built in Rust, a modern systems programming language that guarantees memory safety and exceptional performance. Unlike traditional data pipelines that rely on heavy runtimes like the JVM (Java Virtual Machine) or interpreted languages, DataStreamer compiles to a native binary that runs directly on the CPU. This results in:

Lower CPU Usage: More processing power dedicated to your data, not the runtime.
Minimal Memory Footprint: Significantly reduced RAM requirements compared to alternatives.
Predictable Performance: Consistent throughput without the garbage collection pauses common in other systems.

Our core design principle is throughput-based sizing. Instead of counting components or pipelines, we focus on the volume of data you need to process. This provides a more accurate and scalable model for capacity planning.

Sizing Methodology

Sizing a DataStreamer deployment involves two primary considerations: CPU and Memory.

CPU Sizing

DataStreamer is almost always CPU-constrained, meaning CPU is the most critical resource to plan for. Our sizing model is based on the MiB/s (megabytes per second) of data throughput you expect to process.

Throughput per vCPU

The following table provides baseline throughput estimates for a single virtual CPU (vCPU). Use this as a starting point for your calculations.

Event Type

Average Size

Throughput per vCPU

Structured Logs (JSON, etc.)

~768 bytes

~25 MiB/s

Unstructured Logs (plain text)

~256 bytes

~10 MiB/s

Metrics (Prometheus, etc.)

~256 bytes

~25 MiB/s

Trace Spans (OpenTelemetry)

~1 KB

~25 MiB/s

CPU Calculation Formula

Calculate Base Throughput

Formula:
- Throughput (MiB/s) = Total Events per Second (EPS) * Average Event Size (KB) / 1024

Calculate Base vCPU

Formula:
- Base vCPU = Throughput (MiB/s) / Throughput per vCPU (from table)

Add Transform Overhead

Transformations add CPU load. Apply these multipliers to your Base vCPU calculation.

Transformation

CPU Overhead Multiplier

Light Parsing (JSON, Logfmt)

Base vCPU * 10% * Number of Parsers

Heavy Parsing (Grok, Regex)

Base vCPU * 50% * Number of Parsers

Enrichment (GeoIP, etc.)

Base vCPU * 20% * Number of Enrichments

No parser, no problem! DataStreamer’s AI-powered parser can build complex parsers for you in seconds, not weeks. Take Charge at no extra charge.

Calculate Total vCPU

Steps:
- Subtotal vCPU = Base vCPU + All Transform Overheads
- Total vCPU = Subtotal vCPU * 1.3 (Adds a 30% safety buffer)
Notes:
- We recommend a minimum of 2 vCPU for any production agent deployment.

Memory Sizing

Memory requirements are influenced by event rate, transformations, and buffering.

Memory Calculation Formula

Base Memory

Start with a baseline based on your event rate.
- Base Memory (GB) = Events Per Second (EPS) / 25,000 (approx. 1 GB per 25,000 EPS)

Add Component Memory

Add memory for each stateful component.

Component

Memory Allocation

Light Parser

0.2 GB per parser

Heavy Parser

0.5 GB per parser

Enrichment

0.3 GB per enrichment

Add Buffer Memory

DataStreamer buffers data to handle backpressure and ensure delivery guarantees.

Buffer Memory (GB) = Throughput (MiB/s) * 0.1 (allocates a 100ms buffer)

Calculate Total Memory

Subtotal Memory = Base Memory + All Component Memory + Buffer Memory
Total Memory (GB) = Subtotal Memory * 1.4 (Adds a 40% safety buffer)

Simple Example Sizing Scenario

Let’s calculate the requirements for a workload of 50,000 EPS with an average event size of 2 KB.

The pipeline includes:

2 Heavy Regex Parsers
1 Enrichment transform

CPU Calculation:

Throughput: 50,000 EPS * 2 KB / 1024 = 97.66 MiB/s
Base vCPU: 97.66 MiB/s / 25 MiB/s = 3.91 vCPU
Heavy Parser Overhead: 3.91 * 50% * 2 = 3.91 vCPU
Enrichment Overhead: 3.91 * 20% * 1 = 0.78 vCPU
Subtotal vCPU: 3.91 + 3.91 + 0.78 = 8.6 vCPU
Total vCPU: 8.6 * 1.3 = 11.18 vCPU (Recommended: 12 vCPU)

Memory Calculation:

Base Memory: 50,000 EPS / 25,000 = 2.0 GB
Heavy Parser Memory: 0.5 GB * 2 = 1.0 GB
Enrichment Memory: 0.3 GB * 1 = 0.3 GB
Buffer Memory: 97.66 MiB/s * 0.1 = 9.77 GB (approx)
Subtotal Memory: 2.0 + 1.0 + 0.3 + 9.77 = 13.07 GB
Total Memory: 13.07 * 1.4 = 18.3 GB (Recommended: 20 GB RAM)

For this workload, you should provision a system or set of containers with a total of 12 vCPU and 20 GB of RAM to turn chaos into actionable insights effectively.

PreviousBasic Concepts NextSizing examples

Last updated 1 month ago

hashtagDataStreamer Sizing & Capacity Planning

hashtagThe DataStreamer Performance Philosophy

hashtagSizing Methodology

hashtagCPU Sizing

hashtagThroughput per vCPU

hashtagCPU Calculation Formula

hashtagCalculate Base Throughput

hashtagCalculate Base vCPU

hashtagAdd Transform Overhead

hashtagCalculate Total vCPU

hashtagMemory Sizing

hashtagMemory Calculation Formula

hashtagBase Memory

hashtagAdd Component Memory

hashtagAdd Buffer Memory

hashtagCalculate Total Memory

hashtagSimple Example Sizing Scenario

DataStreamer Sizing & Capacity Planning

The DataStreamer Performance Philosophy

Sizing Methodology

CPU Sizing

Throughput per vCPU

CPU Calculation Formula

Calculate Base Throughput

Calculate Base vCPU

Add Transform Overhead

Calculate Total vCPU

Memory Sizing

Memory Calculation Formula

Base Memory

Add Component Memory

Add Buffer Memory

Calculate Total Memory

Simple Example Sizing Scenario