Sizing & Capacity Planning

DataStreamer Sizing & Capacity Planning

Enterprise Capability, Startup Agility.

DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.

The DataStreamer Performance Philosophy

DataStreamer is built in Rust, a modern systems programming language that guarantees memory safety and exceptional performance. Unlike traditional data pipelines that rely on heavy runtimes like the JVM (Java Virtual Machine) or interpreted languages, DataStreamer compiles to a native binary that runs directly on the CPU. This results in:

  • Lower CPU Usage: More processing power dedicated to your data, not the runtime.

  • Minimal Memory Footprint: Significantly reduced RAM requirements compared to alternatives.

  • Predictable Performance: Consistent throughput without the garbage collection pauses common in other systems.

Our core design principle is throughput-based sizing. Instead of counting components or pipelines, we focus on the volume of data you need to process. This provides a more accurate and scalable model for capacity planning.

Sizing Methodology

Sizing a DataStreamer deployment involves two primary considerations: CPU and Memory.

CPU Sizing

DataStreamer is almost always CPU-constrained, meaning CPU is the most critical resource to plan for. Our sizing model is based on the MiB/s (megabytes per second) of data throughput you expect to process.

Throughput per vCPU

The following table provides baseline throughput estimates for a single virtual CPU (vCPU). Use this as a starting point for your calculations.

Event Type
Average Size
Throughput per vCPU

Structured Logs (JSON, etc.)

~768 bytes

~25 MiB/s

Unstructured Logs (plain text)

~256 bytes

~10 MiB/s

Metrics (Prometheus, etc.)

~256 bytes

~25 MiB/s

Trace Spans (OpenTelemetry)

~1 KB

~25 MiB/s

CPU Calculation Formula

1

Calculate Base Throughput

  • Formula:

    • Throughput (MiB/s) = Total Events per Second (EPS) * Average Event Size (KB) / 1024

2

Calculate Base vCPU

  • Formula:

    • Base vCPU = Throughput (MiB/s) / Throughput per vCPU (from table)

3

Add Transform Overhead

Transformations add CPU load. Apply these multipliers to your Base vCPU calculation.

Transformation
CPU Overhead Multiplier

Light Parsing (JSON, Logfmt)

Base vCPU * 10% * Number of Parsers

Heavy Parsing (Grok, Regex)

Base vCPU * 50% * Number of Parsers

Enrichment (GeoIP, etc.)

Base vCPU * 20% * Number of Enrichments

No parser, no problem! DataStreamer’s AI-powered parser can build complex parsers for you in seconds, not weeks. Take Charge at no extra charge.

4

Calculate Total vCPU

  • Steps:

    • Subtotal vCPU = Base vCPU + All Transform Overheads

    • Total vCPU = Subtotal vCPU * 1.3 (Adds a 30% safety buffer)

  • Notes:

    • We recommend a minimum of 2 vCPU for any production agent deployment.

Memory Sizing

Memory requirements are influenced by event rate, transformations, and buffering.

Memory Calculation Formula

1

Base Memory

  • Start with a baseline based on your event rate.

    • Base Memory (GB) = Events Per Second (EPS) / 25,000 (approx. 1 GB per 25,000 EPS)

2

Add Component Memory

Add memory for each stateful component.

Component
Memory Allocation

Light Parser

0.2 GB per parser

Heavy Parser

0.5 GB per parser

Enrichment

0.3 GB per enrichment

3

Add Buffer Memory

DataStreamer buffers data to handle backpressure and ensure delivery guarantees.

  • Buffer Memory (GB) = Throughput (MiB/s) * 0.1 (allocates a 100ms buffer)

4

Calculate Total Memory

  • Subtotal Memory = Base Memory + All Component Memory + Buffer Memory

  • Total Memory (GB) = Subtotal Memory * 1.4 (Adds a 40% safety buffer)


Simple Example Sizing Scenario

Let’s calculate the requirements for a workload of 50,000 EPS with an average event size of 2 KB.

The pipeline includes:

  • 2 Heavy Regex Parsers

  • 1 Enrichment transform

  1. CPU Calculation:

  • Throughput: 50,000 EPS * 2 KB / 1024 = 97.66 MiB/s

  • Base vCPU: 97.66 MiB/s / 25 MiB/s = 3.91 vCPU

  • Heavy Parser Overhead: 3.91 * 50% * 2 = 3.91 vCPU

  • Enrichment Overhead: 3.91 * 20% * 1 = 0.78 vCPU

  • Subtotal vCPU: 3.91 + 3.91 + 0.78 = 8.6 vCPU

  • Total vCPU: 8.6 * 1.3 = 11.18 vCPU (Recommended: 12 vCPU)

  1. Memory Calculation:

  • Base Memory: 50,000 EPS / 25,000 = 2.0 GB

  • Heavy Parser Memory: 0.5 GB * 2 = 1.0 GB

  • Enrichment Memory: 0.3 GB * 1 = 0.3 GB

  • Buffer Memory: 97.66 MiB/s * 0.1 = 9.77 GB (approx)

  • Subtotal Memory: 2.0 + 1.0 + 0.3 + 9.77 = 13.07 GB

  • Total Memory: 13.07 * 1.4 = 18.3 GB (Recommended: 20 GB RAM)

For this workload, you should provision a system or set of containers with a total of 12 vCPU and 20 GB of RAM to turn chaos into actionable insights effectively.

Last updated