Sizing examples

Enterprise Capability, Startup Agility. DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.

Advanced Scenarios: Impact of Data Structure

The structure of your data has a significant impact on resource requirements. Unstructured data requires more CPU-intensive parsing (typically using regex) than structured data like JSON. The following scenarios illustrate how CPU and memory needs change based on your data mix, even when throughput remains constant.

Common Workload Parameters

Total EPS: 6,000
Average Event Size: 2 KB
Total Throughput: 11.72 MiB/s
Transforms: 6 Regex Parsers, 3 Enrichments

Scenario 1: 100% Unstructured Data

This scenario represents the heaviest processing load, as all incoming data must be parsed using complex regex.

Throughput per vCPU: 10 MiB/s (baseline for unstructured data)

CPU

Base vCPU: 11.72 MiB/s / 10 MiB/s = 1.17 vCPU
Heavy Parser Overhead: 1.17 * 50% * 6 = 3.51 vCPU
Enrichment Overhead: 1.17 * 20% * 3 = 0.70 vCPU
Subtotal vCPU: 1.17 + 3.51 + 0.70 = 5.38 vCPU
Total vCPU: 5.38 * 1.3 = 7.0 vCPU (Recommended: 8 vCPU)

Memory

Base Memory: 6000 / 25000 = 0.24 GB
Heavy Parser Memory: 0.5 GB * 6 = 3.0 GB
Enrichment Memory: 0.3 GB * 3 = 0.9 GB
Buffer Memory: 11.72 * 0.1 = 1.17 GB
Subtotal Memory: 0.24 + 3.0 + 0.9 + 1.17 = 5.31 GB
Total Memory: 5.31 * 1.4 = 7.43 GB (Recommended: 8 GB RAM)

Scenario 2: 70% Unstructured, 30% JSON

Weighted Throughput/vCPU: (0.7 * 10) + (0.3 * 25) = 14.5 MiB/s

CPU

Base vCPU: 11.72 MiB/s / 14.5 MiB/s = 0.81 vCPU
Heavy Parser Overhead: 0.81 * 50% * 6 = 2.43 vCPU
Light Parser Overhead: 0.81 * 10% * 1 = 0.08 vCPU
Enrichment Overhead: 0.81 * 20% * 3 = 0.49 vCPU
Subtotal vCPU: 0.81 + 2.43 + 0.08 + 0.49 = 3.81 vCPU
Total vCPU: 3.81 * 1.3 = 4.95 vCPU (Recommended: 6 vCPU)

Memory

Base Memory: 0.24 GB
Parser Memory: (0.5 GB * 6) + (0.2 GB * 1) = 3.2 GB
Enrichment Memory: 0.9 GB
Buffer Memory: 1.17 GB
Subtotal Memory: 0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB
Total Memory: 5.51 * 1.4 = 7.71 GB (Recommended: 8 GB RAM)

Scenario 3: 50% Unstructured, 50% JSON

Weighted Throughput/vCPU: (0.5 * 10) + (0.5 * 25) = 17.5 MiB/s

CPU

Base vCPU: 11.72 MiB/s / 17.5 MiB/s = 0.67 vCPU
Heavy Parser Overhead: 0.67 * 50% * 6 = 2.01 vCPU
Light Parser Overhead: 0.67 * 10% * 1 = 0.07 vCPU
Enrichment Overhead: 0.67 * 20% * 3 = 0.40 vCPU
Subtotal vCPU: 0.67 + 2.01 + 0.07 + 0.40 = 3.15 vCPU
Total vCPU: 3.15 * 1.3 = 4.1 vCPU (Recommended: 5 vCPU)

Memory

Base Memory: 0.24 GB
Parser Memory: 3.2 GB
Enrichment Memory: 0.9 GB
Buffer Memory: 1.17 GB
Subtotal Memory: 0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB
Total Memory: 5.51 * 1.4 = 7.71 GB (Recommended: 8 GB RAM)

Scenario 4: 10% Unstructured, 90% JSON

This scenario represents the lightest processing load, as most data is pre-structured.

Weighted Throughput/vCPU: (0.1 * 10) + (0.9 * 25) = 23.5 MiB/s

CPU

Base vCPU: 11.72 MiB/s / 23.5 MiB/s = 0.50 vCPU
Heavy Parser Overhead: 0.50 * 50% * 6 = 1.5 vCPU
Light Parser Overhead: 0.50 * 10% * 1 = 0.05 vCPU
Enrichment Overhead: 0.50 * 20% * 3 = 0.30 vCPU
Subtotal vCPU: 0.50 + 1.5 + 0.05 + 0.30 = 2.35 vCPU
Total vCPU: 2.35 * 1.3 = 3.06 vCPU (Recommended: 4 vCPU)

Memory

Base Memory: 0.24 GB
Parser Memory: 3.2 GB
Enrichment Memory: 0.9 GB
Buffer Memory: 1.17 GB
Subtotal Memory: 0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB
Total Memory: 5.51 * 1.4 = 7.71 GB (Recommended: 8 GB RAM)

Comparison and Conclusion

Resource Comparison Table

Scenario

Data Mix (Unstructured/JSON)

Recommended vCPU

Recommended RAM

100% / 0%

8 vCPU

8 GB

70% / 30%

6 vCPU

8 GB

50% / 50%

5 vCPU

8 GB

10% / 90%

4 vCPU

8 GB

Conclusion: Data Transformation is CPU-Intensive

As the comparison table clearly shows, DataStreamer is almost always CPU-intensive, and the primary driver of CPU consumption is the nature of the data transformation taking place.

CPU scales with complexity: As the percentage of unstructured data requiring heavy regex parsing decreases from 100% to 10%, the required vCPU is reduced by 50% (from 8 to 4 vCPU).
Memory remains stable: Memory requirements are primarily driven by the number of components (parsers, enrichments) and overall throughput, which remain constant across these scenarios. As a result, the recommended RAM stays at 8 GB for all four scenarios.

This demonstrates that the most significant factor in sizing your DataStreamer deployment is the complexity of your data and the transformations you apply. The more parsing and manipulation required, the more CPU resources you should allocate. DataStreamer's efficient, Rust-based architecture ensures that these CPU resources are used as effectively as possible to turn your raw data into actionable insights.

PreviousSizing & Capacity Planning NextPerformance Benchmark

Last updated 1 month ago

hashtagAdvanced Scenarios: Impact of Data Structure

hashtagCommon Workload Parameters

hashtagScenario 1: 100% Unstructured Data

hashtagScenario 2: 70% Unstructured, 30% JSON

hashtagScenario 3: 50% Unstructured, 50% JSON

hashtagScenario 4: 10% Unstructured, 90% JSON

hashtagComparison and Conclusion

hashtagResource Comparison Table

hashtagConclusion: Data Transformation is CPU-Intensive

Advanced Scenarios: Impact of Data Structure

Common Workload Parameters

Scenario 1: 100% Unstructured Data

Scenario 2: 70% Unstructured, 30% JSON

Scenario 3: 50% Unstructured, 50% JSON

Scenario 4: 10% Unstructured, 90% JSON

Comparison and Conclusion

Resource Comparison Table

Conclusion: Data Transformation is CPU-Intensive