# Sizing examples

**Enterprise Capability, Startup Agility.** DataStreamer is engineered for high performance and resource efficiency, allowing you to handle massive data volumes without the massive infrastructure costs. This guide provides a comprehensive methodology for sizing your DataStreamer deployment to meet your specific throughput and processing needs.

## Advanced Scenarios: Impact of Data Structure

The structure of your data has a significant impact on resource requirements. Unstructured data requires more CPU-intensive parsing (typically using regex) than structured data like JSON. The following scenarios illustrate how CPU and memory needs change based on your data mix, even when throughput remains constant.

### Common Workload Parameters

* **Total EPS**: 6,000
* **Average Event Size**: 2 KB
* **Total Throughput**: 11.72 MiB/s
* **Transforms**: 6 Regex Parsers, 3 Enrichments

### Scenario 1: 100% Unstructured Data

This scenario represents the heaviest processing load, as all incoming data must be parsed using complex regex.

* **Throughput per vCPU**: 10 MiB/s (baseline for unstructured data)

CPU

* **Base vCPU**: `11.72 MiB/s / 10 MiB/s = 1.17 vCPU`
* **Heavy Parser Overhead**: `1.17 * 50% * 6 = 3.51 vCPU`
* **Enrichment Overhead**: `1.17 * 20% * 3 = 0.70 vCPU`
* **Subtotal vCPU**: `1.17 + 3.51 + 0.70 = 5.38 vCPU`
* **Total vCPU**: `5.38 * 1.3 = 7.0 vCPU` (Recommended: **8 vCPU**)

Memory

* **Base Memory**: `6000 / 25000 = 0.24 GB`
* **Heavy Parser Memory**: `0.5 GB * 6 = 3.0 GB`
* **Enrichment Memory**: `0.3 GB * 3 = 0.9 GB`
* **Buffer Memory**: `11.72 * 0.1 = 1.17 GB`
* **Subtotal Memory**: `0.24 + 3.0 + 0.9 + 1.17 = 5.31 GB`
* **Total Memory**: `5.31 * 1.4 = 7.43 GB` (Recommended: **8 GB RAM**)

### Scenario 2: 70% Unstructured, 30% JSON

* **Weighted Throughput/vCPU**: `(0.7 * 10) + (0.3 * 25) = 14.5 MiB/s`

CPU

* **Base vCPU**: `11.72 MiB/s / 14.5 MiB/s = 0.81 vCPU`
* **Heavy Parser Overhead**: `0.81 * 50% * 6 = 2.43 vCPU`
* **Light Parser Overhead**: `0.81 * 10% * 1 = 0.08 vCPU`
* **Enrichment Overhead**: `0.81 * 20% * 3 = 0.49 vCPU`
* **Subtotal vCPU**: `0.81 + 2.43 + 0.08 + 0.49 = 3.81 vCPU`
* **Total vCPU**: `3.81 * 1.3 = 4.95 vCPU` (Recommended: **6 vCPU**)

Memory

* **Base Memory**: 0.24 GB
* **Parser Memory**: `(0.5 GB * 6) + (0.2 GB * 1) = 3.2 GB`
* **Enrichment Memory**: 0.9 GB
* **Buffer Memory**: 1.17 GB
* **Subtotal Memory**: `0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB`
* **Total Memory**: `5.51 * 1.4 = 7.71 GB` (Recommended: **8 GB RAM**)

### Scenario 3: 50% Unstructured, 50% JSON

* **Weighted Throughput/vCPU**: `(0.5 * 10) + (0.5 * 25) = 17.5 MiB/s`

CPU

* **Base vCPU**: `11.72 MiB/s / 17.5 MiB/s = 0.67 vCPU`
* **Heavy Parser Overhead**: `0.67 * 50% * 6 = 2.01 vCPU`
* **Light Parser Overhead**: `0.67 * 10% * 1 = 0.07 vCPU`
* **Enrichment Overhead**: `0.67 * 20% * 3 = 0.40 vCPU`
* **Subtotal vCPU**: `0.67 + 2.01 + 0.07 + 0.40 = 3.15 vCPU`
* **Total vCPU**: `3.15 * 1.3 = 4.1 vCPU` (Recommended: **5 vCPU**)

Memory

* **Base Memory**: 0.24 GB
* **Parser Memory**: 3.2 GB
* **Enrichment Memory**: 0.9 GB
* **Buffer Memory**: 1.17 GB
* **Subtotal Memory**: `0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB`
* **Total Memory**: `5.51 * 1.4 = 7.71 GB` (Recommended: **8 GB RAM**)

### Scenario 4: 10% Unstructured, 90% JSON

This scenario represents the lightest processing load, as most data is pre-structured.

* **Weighted Throughput/vCPU**: `(0.1 * 10) + (0.9 * 25) = 23.5 MiB/s`

CPU

* **Base vCPU**: `11.72 MiB/s / 23.5 MiB/s = 0.50 vCPU`
* **Heavy Parser Overhead**: `0.50 * 50% * 6 = 1.5 vCPU`
* **Light Parser Overhead**: `0.50 * 10% * 1 = 0.05 vCPU`
* **Enrichment Overhead**: `0.50 * 20% * 3 = 0.30 vCPU`
* **Subtotal vCPU**: `0.50 + 1.5 + 0.05 + 0.30 = 2.35 vCPU`
* **Total vCPU**: `2.35 * 1.3 = 3.06 vCPU` (Recommended: **4 vCPU**)

Memory

* **Base Memory**: 0.24 GB
* **Parser Memory**: 3.2 GB
* **Enrichment Memory**: 0.9 GB
* **Buffer Memory**: 1.17 GB
* **Subtotal Memory**: `0.24 + 3.2 + 0.9 + 1.17 = 5.51 GB`
* **Total Memory**: `5.51 * 1.4 = 7.71 GB` (Recommended: **8 GB RAM**)

***

## Comparison and Conclusion

### Resource Comparison Table

| Scenario | Data Mix (Unstructured/JSON) | Recommended vCPU | Recommended RAM |
| -------- | ---------------------------: | ---------------: | --------------: |
| 1        |                    100% / 0% |       **8 vCPU** |            8 GB |
| 2        |                    70% / 30% |       **6 vCPU** |            8 GB |
| 3        |                    50% / 50% |       **5 vCPU** |            8 GB |
| 4        |                    10% / 90% |       **4 vCPU** |            8 GB |

### Conclusion: Data Transformation is CPU-Intensive

As the comparison table clearly shows, **DataStreamer is almost always CPU-intensive**, and the primary driver of CPU consumption is the nature of the data transformation taking place.

* **CPU scales with complexity**: As the percentage of unstructured data requiring heavy regex parsing decreases from 100% to 10%, the required vCPU is **reduced by 50%** (from 8 to 4 vCPU).
* **Memory remains stable**: Memory requirements are primarily driven by the number of components (parsers, enrichments) and overall throughput, which remain constant across these scenarios. As a result, the recommended RAM stays at 8 GB for all four scenarios.

This demonstrates that the most significant factor in sizing your DataStreamer deployment is the **complexity of your data and the transformations** you apply. The more parsing and manipulation required, the more CPU resources you should allocate. DataStreamer's efficient, Rust-based architecture ensures that these CPU resources are used as effectively as possible to turn your raw data into actionable insights.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.blusapphire.io/release-6.0/03_datastreamer/sizing-examples.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
