# Sizing examples

**Enterprise Capability, Startup Agility.** DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.

## Advanced Scenarios: Impact of Data Structure

The structure of your data has a significant impact on resource requirements. Unstructured data requires more CPU-intensive parsing (typically using regex) than structured data like JSON. The following scenarios illustrate how CPU and memory needs change based on your data mix, even when throughput remains constant.

### Common Workload Parameters

* **Total EPS**: 6,000
* **Average Event Size**: 2 KB
* **Total Throughput**: 11.72 MiB/s
* **Transforms**: 6 Regex Parsers, 3 Enrichments

### Scenario 1: 100% Unstructured Data

This scenario represents the heaviest processing load, as all incoming data must be parsed using complex regex.

* **Throughput per vCPU**: 10 MiB/s (baseline for unstructured data)

CPU

* **Base vCPU**: `11.72 MiB/s / 10 MiB/s = 1.17 vCPU`
* **Heavy Parser Overhead**: `1.17 * 50% * 6 = 3.51 vCPU`
* **Enrichment Overhead**: `1.17 * 20% * 3 = 0.70 vCPU`
* **Subtotal vCPU**: `1.17 + 3.51 + 0.70 = 5.38 vCPU`
* **Total vCPU**: `5.38 * 1.3 = 7.0 vCPU` (Recommended: **8 vCPU**)

Memory

* **Base Memory**: `6000 / 25000 = 0.24 GB`
* **Heavy Parser Memory**: `0.5 GB * 6 = 3.0 GB`
* **Enrichment Memory**: `0.3 GB * 3 = 0.9 GB`
* **Buffer Memory**: `11.72 * 0.1 = 1.17 GB`
* **Subtotal Memory**: `0.24 + 3.0 + 0.9 + 1.17 = 5.31 GB`
* **Total Memory**: `5.31 * 1.4 = 7.43 GB` (Recommended: **8 GB RAM**)

### Scenario 2: 70% Unstructured, 30% JSON

* **Weighted Throughput/vCPU**: `(0.7 * 10) + (0.3 * 25) = 14.5 MiB/s`

CPU

* **Base vCPU**: `11.72 MiB/s / 14.5 MiB/s = 0.81 vCPU`
* **Heavy Parser Overhead**: `0.81 * 50% * 6 = 2.43 vCPU`
* **Light Parser Overhead**: `0.81 * 10% * 1 = 0.08 vCPU`
* **Enrichment Overhead**: `0.81 * 20% * 3 = 0.49 vCPU`
* **Subtotal vCPU**: `0.81 + 2.43 + 0.08 + 0.49 = 3.81 vCPU`
* **Total vCPU**: `3.81 * 1.3 = 4.95 vCPU` (Recommended: **6 vCPU**)

Memory

* **Base Memory**: 0.24 GB
* **Parser Memory**: `(0.5 GB * 6) + (0.2 GB * 1) = 3.2 GB`
* **Enrichment Memory**: 0.9 GB
* **Buffer Memory**: 1.17 GB
* **Subtotal Memory**: `0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB`
* **Total Memory**: `5.51 * 1.4 = 7.71 GB` (Recommended: **8 GB RAM**)

### Scenario 3: 50% Unstructured, 50% JSON

* **Weighted Throughput/vCPU**: `(0.5 * 10) + (0.5 * 25) = 17.5 MiB/s`

CPU

* **Base vCPU**: `11.72 MiB/s / 17.5 MiB/s = 0.67 vCPU`
* **Heavy Parser Overhead**: `0.67 * 50% * 6 = 2.01 vCPU`
* **Light Parser Overhead**: `0.67 * 10% * 1 = 0.07 vCPU`
* **Enrichment Overhead**: `0.67 * 20% * 3 = 0.40 vCPU`
* **Subtotal vCPU**: `0.67 + 2.01 + 0.07 + 0.40 = 3.15 vCPU`
* **Total vCPU**: `3.15 * 1.3 = 4.1 vCPU` (Recommended: **5 vCPU**)

Memory

* **Base Memory**: 0.24 GB
* **Parser Memory**: 3.2 GB
* **Enrichment Memory**: 0.9 GB
* **Buffer Memory**: 1.17 GB
* **Subtotal Memory**: `0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB`
* **Total Memory**: `5.51 * 1.4 = 7.71 GB` (Recommended: **8 GB RAM**)

### Scenario 4: 10% Unstructured, 90% JSON

This scenario represents the lightest processing load, as most data is pre-structured.

* **Weighted Throughput/vCPU**: `(0.1 * 10) + (0.9 * 25) = 23.5 MiB/s`

CPU

* **Base vCPU**: `11.72 MiB/s / 23.5 MiB/s = 0.50 vCPU`
* **Heavy Parser Overhead**: `0.50 * 50% * 6 = 1.5 vCPU`
* **Light Parser Overhead**: `0.50 * 10% * 1 = 0.05 vCPU`
* **Enrichment Overhead**: `0.50 * 20% * 3 = 0.30 vCPU`
* **Subtotal vCPU**: `0.50 + 1.5 + 0.05 + 0.30 = 2.35 vCPU`
* **Total vCPU**: `2.35 * 1.3 = 3.06 vCPU` (Recommended: **4 vCPU**)

Memory

* **Base Memory**: 0.24 GB
* **Parser Memory**: 3.2 GB
* **Enrichment Memory**: 0.9 GB
* **Buffer Memory**: 1.17 GB
* **Subtotal Memory**: `0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB`
* **Total Memory**: `5.51 * 1.4 = 7.71 GB` (Recommended: **8 GB RAM**)

***

## Comparison and Conclusion

### Resource Comparison Table

| Scenario | Data Mix (Unstructured/JSON) | Recommended vCPU | Recommended RAM |
| -------- | ---------------------------: | ---------------: | --------------: |
| 1        |                    100% / 0% |       **8 vCPU** |            8 GB |
| 2        |                    70% / 30% |       **6 vCPU** |            8 GB |
| 3        |                    50% / 50% |       **5 vCPU** |            8 GB |
| 4        |                    10% / 90% |       **4 vCPU** |            8 GB |

### Conclusion: Data Transformation is CPU-Intensive

As the comparison table clearly shows, **DataStreamer is almost always CPU-intensive**, and the primary driver of CPU consumption is the nature of the data transformation taking place.

* **CPU scales with complexity**: As the percentage of unstructured data requiring heavy regex parsing decreases from 100% to 10%, the required vCPU is **reduced by 50%** (from 8 to 4 vCPU).
* **Memory remains stable**: Memory requirements are primarily driven by the number of components (parsers, enrichments) and overall throughput, which remain constant across these scenarios. As a result, the recommended RAM stays at 8 GB for all four scenarios.

This demonstrates that the most significant factor in sizing your DataStreamer deployment is the **complexity of your data and the transformations** you apply. The more parsing and manipulation required, the more CPU resources you should allocate. DataStreamer's efficient, Rust-based architecture ensures that these CPU resources are used as effectively as possible to turn your raw data into actionable insights.
