Sizing examples

Enterprise Capability, Startup Agility. DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.

Advanced Scenarios: Impact of Data Structure

The structure of your data has a significant impact on resource requirements. Unstructured data requires more CPU-intensive parsing (typically using regex) than structured data like JSON. The following scenarios illustrate how CPU and memory needs change based on your data mix, even when throughput remains constant.

Common Workload Parameters

  • Total EPS: 6,000

  • Average Event Size: 2 KB

  • Total Throughput: 11.72 MiB/s

  • Transforms: 6 Regex Parsers, 3 Enrichments

Scenario 1: 100% Unstructured Data

This scenario represents the heaviest processing load, as all incoming data must be parsed using complex regex.

  • Throughput per vCPU: 10 MiB/s (baseline for unstructured data)

CPU

  • Base vCPU: 11.72 MiB/s / 10 MiB/s = 1.17 vCPU

  • Heavy Parser Overhead: 1.17 * 50% * 6 = 3.51 vCPU

  • Enrichment Overhead: 1.17 * 20% * 3 = 0.70 vCPU

  • Subtotal vCPU: 1.17 + 3.51 + 0.70 = 5.38 vCPU

  • Total vCPU: 5.38 * 1.3 = 7.0 vCPU (Recommended: 8 vCPU)

Memory

  • Base Memory: 6000 / 25000 = 0.24 GB

  • Heavy Parser Memory: 0.5 GB * 6 = 3.0 GB

  • Enrichment Memory: 0.3 GB * 3 = 0.9 GB

  • Buffer Memory: 11.72 * 0.1 = 1.17 GB

  • Subtotal Memory: 0.24 + 3.0 + 0.9 + 1.17 = 5.31 GB

  • Total Memory: 5.31 * 1.4 = 7.43 GB (Recommended: 8 GB RAM)

Scenario 2: 70% Unstructured, 30% JSON

  • Weighted Throughput/vCPU: (0.7 * 10) + (0.3 * 25) = 14.5 MiB/s

CPU

  • Base vCPU: 11.72 MiB/s / 14.5 MiB/s = 0.81 vCPU

  • Heavy Parser Overhead: 0.81 * 50% * 6 = 2.43 vCPU

  • Light Parser Overhead: 0.81 * 10% * 1 = 0.08 vCPU

  • Enrichment Overhead: 0.81 * 20% * 3 = 0.49 vCPU

  • Subtotal vCPU: 0.81 + 2.43 + 0.08 + 0.49 = 3.81 vCPU

  • Total vCPU: 3.81 * 1.3 = 4.95 vCPU (Recommended: 6 vCPU)

Memory

  • Base Memory: 0.24 GB

  • Parser Memory: (0.5 GB * 6) + (0.2 GB * 1) = 3.2 GB

  • Enrichment Memory: 0.9 GB

  • Buffer Memory: 1.17 GB

  • Subtotal Memory: 0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB

  • Total Memory: 5.51 * 1.4 = 7.71 GB (Recommended: 8 GB RAM)

Scenario 3: 50% Unstructured, 50% JSON

  • Weighted Throughput/vCPU: (0.5 * 10) + (0.5 * 25) = 17.5 MiB/s

CPU

  • Base vCPU: 11.72 MiB/s / 17.5 MiB/s = 0.67 vCPU

  • Heavy Parser Overhead: 0.67 * 50% * 6 = 2.01 vCPU

  • Light Parser Overhead: 0.67 * 10% * 1 = 0.07 vCPU

  • Enrichment Overhead: 0.67 * 20% * 3 = 0.40 vCPU

  • Subtotal vCPU: 0.67 + 2.01 + 0.07 + 0.40 = 3.15 vCPU

  • Total vCPU: 3.15 * 1.3 = 4.1 vCPU (Recommended: 5 vCPU)

Memory

  • Base Memory: 0.24 GB

  • Parser Memory: 3.2 GB

  • Enrichment Memory: 0.9 GB

  • Buffer Memory: 1.17 GB

  • Subtotal Memory: 0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB

  • Total Memory: 5.51 * 1.4 = 7.71 GB (Recommended: 8 GB RAM)

Scenario 4: 10% Unstructured, 90% JSON

This scenario represents the lightest processing load, as most data is pre-structured.

  • Weighted Throughput/vCPU: (0.1 * 10) + (0.9 * 25) = 23.5 MiB/s

CPU

  • Base vCPU: 11.72 MiB/s / 23.5 MiB/s = 0.50 vCPU

  • Heavy Parser Overhead: 0.50 * 50% * 6 = 1.5 vCPU

  • Light Parser Overhead: 0.50 * 10% * 1 = 0.05 vCPU

  • Enrichment Overhead: 0.50 * 20% * 3 = 0.30 vCPU

  • Subtotal vCPU: 0.50 + 1.5 + 0.05 + 0.30 = 2.35 vCPU

  • Total vCPU: 2.35 * 1.3 = 3.06 vCPU (Recommended: 4 vCPU)

Memory

  • Base Memory: 0.24 GB

  • Parser Memory: 3.2 GB

  • Enrichment Memory: 0.9 GB

  • Buffer Memory: 1.17 GB

  • Subtotal Memory: 0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB

  • Total Memory: 5.51 * 1.4 = 7.71 GB (Recommended: 8 GB RAM)


Comparison and Conclusion

Resource Comparison Table

Scenario
Data Mix (Unstructured/JSON)
Recommended vCPU
Recommended RAM

1

100% / 0%

8 vCPU

8 GB

2

70% / 30%

6 vCPU

8 GB

3

50% / 50%

5 vCPU

8 GB

4

10% / 90%

4 vCPU

8 GB

Conclusion: Data Transformation is CPU-Intensive

As the comparison table clearly shows, DataStreamer is almost always CPU-intensive, and the primary driver of CPU consumption is the nature of the data transformation taking place.

  • CPU scales with complexity: As the percentage of unstructured data requiring heavy regex parsing decreases from 100% to 10%, the required vCPU is reduced by 50% (from 8 to 4 vCPU).

  • Memory remains stable: Memory requirements are primarily driven by the number of components (parsers, enrichments) and overall throughput, which remain constant across these scenarios. As a result, the recommended RAM stays at 8 GB for all four scenarios.

This demonstrates that the most significant factor in sizing your DataStreamer deployment is the complexity of your data and the transformations you apply. The more parsing and manipulation required, the more CPU resources you should allocate. DataStreamer's efficient, Rust-based architecture ensures that these CPU resources are used as effectively as possible to turn your raw data into actionable insights.

Last updated