Sizing examples
Enterprise Capability, Startup Agility. DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.
Advanced Scenarios: Impact of Data Structure
The structure of your data has a significant impact on resource requirements. Unstructured data requires more CPU-intensive parsing (typically using regex) than structured data like JSON. The following scenarios illustrate how CPU and memory needs change based on your data mix, even when throughput remains constant.
Common Workload Parameters
Total EPS: 6,000
Average Event Size: 2 KB
Total Throughput: 11.72 MiB/s
Transforms: 6 Regex Parsers, 3 Enrichments
Scenario 1: 100% Unstructured Data
This scenario represents the heaviest processing load, as all incoming data must be parsed using complex regex.
Throughput per vCPU: 10 MiB/s (baseline for unstructured data)
CPU
Base vCPU:
11.72 MiB/s / 10 MiB/s = 1.17 vCPUHeavy Parser Overhead:
1.17 * 50% * 6 = 3.51 vCPUEnrichment Overhead:
1.17 * 20% * 3 = 0.70 vCPUSubtotal vCPU:
1.17 + 3.51 + 0.70 = 5.38 vCPUTotal vCPU:
5.38 * 1.3 = 7.0 vCPU(Recommended: 8 vCPU)
Memory
Base Memory:
6000 / 25000 = 0.24 GBHeavy Parser Memory:
0.5 GB * 6 = 3.0 GBEnrichment Memory:
0.3 GB * 3 = 0.9 GBBuffer Memory:
11.72 * 0.1 = 1.17 GBSubtotal Memory:
0.24 + 3.0 + 0.9 + 1.17 = 5.31 GBTotal Memory:
5.31 * 1.4 = 7.43 GB(Recommended: 8 GB RAM)
Scenario 2: 70% Unstructured, 30% JSON
Weighted Throughput/vCPU:
(0.7 * 10) + (0.3 * 25) = 14.5 MiB/s
CPU
Base vCPU:
11.72 MiB/s / 14.5 MiB/s = 0.81 vCPUHeavy Parser Overhead:
0.81 * 50% * 6 = 2.43 vCPULight Parser Overhead:
0.81 * 10% * 1 = 0.08 vCPUEnrichment Overhead:
0.81 * 20% * 3 = 0.49 vCPUSubtotal vCPU:
0.81 + 2.43 + 0.08 + 0.49 = 3.81 vCPUTotal vCPU:
3.81 * 1.3 = 4.95 vCPU(Recommended: 6 vCPU)
Memory
Base Memory: 0.24 GB
Parser Memory:
(0.5 GB * 6) + (0.2 GB * 1) = 3.2 GBEnrichment Memory: 0.9 GB
Buffer Memory: 1.17 GB
Subtotal Memory:
0.24 + 3.2 + 0.9 + 1.17 = 5.51 GBTotal Memory:
5.51 * 1.4 = 7.71 GB(Recommended: 8 GB RAM)
Scenario 3: 50% Unstructured, 50% JSON
Weighted Throughput/vCPU:
(0.5 * 10) + (0.5 * 25) = 17.5 MiB/s
CPU
Base vCPU:
11.72 MiB/s / 17.5 MiB/s = 0.67 vCPUHeavy Parser Overhead:
0.67 * 50% * 6 = 2.01 vCPULight Parser Overhead:
0.67 * 10% * 1 = 0.07 vCPUEnrichment Overhead:
0.67 * 20% * 3 = 0.40 vCPUSubtotal vCPU:
0.67 + 2.01 + 0.07 + 0.40 = 3.15 vCPUTotal vCPU:
3.15 * 1.3 = 4.1 vCPU(Recommended: 5 vCPU)
Memory
Base Memory: 0.24 GB
Parser Memory: 3.2 GB
Enrichment Memory: 0.9 GB
Buffer Memory: 1.17 GB
Subtotal Memory:
0.24 + 3.2 + 0.9 + 1.17 = 5.51 GBTotal Memory:
5.51 * 1.4 = 7.71 GB(Recommended: 8 GB RAM)
Scenario 4: 10% Unstructured, 90% JSON
This scenario represents the lightest processing load, as most data is pre-structured.
Weighted Throughput/vCPU:
(0.1 * 10) + (0.9 * 25) = 23.5 MiB/s
CPU
Base vCPU:
11.72 MiB/s / 23.5 MiB/s = 0.50 vCPUHeavy Parser Overhead:
0.50 * 50% * 6 = 1.5 vCPULight Parser Overhead:
0.50 * 10% * 1 = 0.05 vCPUEnrichment Overhead:
0.50 * 20% * 3 = 0.30 vCPUSubtotal vCPU:
0.50 + 1.5 + 0.05 + 0.30 = 2.35 vCPUTotal vCPU:
2.35 * 1.3 = 3.06 vCPU(Recommended: 4 vCPU)
Memory
Base Memory: 0.24 GB
Parser Memory: 3.2 GB
Enrichment Memory: 0.9 GB
Buffer Memory: 1.17 GB
Subtotal Memory:
0.24 + 3.2 + 0.9 + 1.17 = 5.51 GBTotal Memory:
5.51 * 1.4 = 7.71 GB(Recommended: 8 GB RAM)
Comparison and Conclusion
Resource Comparison Table
1
100% / 0%
8 vCPU
8 GB
2
70% / 30%
6 vCPU
8 GB
3
50% / 50%
5 vCPU
8 GB
4
10% / 90%
4 vCPU
8 GB
Conclusion: Data Transformation is CPU-Intensive
As the comparison table clearly shows, DataStreamer is almost always CPU-intensive, and the primary driver of CPU consumption is the nature of the data transformation taking place.
CPU scales with complexity: As the percentage of unstructured data requiring heavy regex parsing decreases from 100% to 10%, the required vCPU is reduced by 50% (from 8 to 4 vCPU).
Memory remains stable: Memory requirements are primarily driven by the number of components (parsers, enrichments) and overall throughput, which remain constant across these scenarios. As a result, the recommended RAM stays at 8 GB for all four scenarios.
This demonstrates that the most significant factor in sizing your DataStreamer deployment is the complexity of your data and the transformations you apply. The more parsing and manipulation required, the more CPU resources you should allocate. DataStreamer's efficient, Rust-based architecture ensures that these CPU resources are used as effectively as possible to turn your raw data into actionable insights.
Last updated