Basic Concepts

To understand the Data Pipeline Manager (DPM), it's helpful to visualize it as a system that receives events from multiple sources, processes them, and then transmits them to one or more destinations. Let's focus on the core functionalities offered by the Data Pipeline Manager (DPM). The primary components of the interface are:

  • Sources: Responsible for collecting data from various input points.

  • Pipelines: Manages the transformation of data, defining how it should be processed.

  • Enrichments: Enhances events with additional information like GeoIP information.

  • Destinations: Route the processed data to its final location.

These elements work together to efficiently handle and process your data within the DPM system.

Sources:

Sources are configurations that specify where the system should pull data from or how to receive data that is pushed to it. They enable DPM to consume data from various origins, sources include:

  • Agents like Filebeat, Winlogbeat, Syslog, TCP, and other external systems that push data to the DPM

  • File Stores: Remote file storage solutions from which the DPM can pull data.

Sources are essential for integrating and onboarding data into the DPM. They enable the ingestion of data from various inputs.

Pipelines:

Pipelines in the Data Pipeline Manager (DPM) are sequences of transformations that process and modify data as it is ingested. These transformations can include:

  • Parsing: Extracting relevant information from raw data.

  • Filtering: Removing or including specific data based on set criteria.

When data enters DPM, they are initially routed and pre-processed based on key fields such as observer.type, observer.product, observer.vendor, and source_type. These events are then directed to the start of a pipeline.

As the events move through the pipeline, each transformation applies its processing rules. The output from one transformation is passed to the next, ensuring that data is processed in a structured and sequential manner until it reaches the end of the pipeline.

Enrichments:

Depending on the configured enrichment type and specific conditions, events are enhanced with additional information, such as GeoIP data. This enrichment process adds valuable context to each event, enabling deeper analysis and insights.

Destinations:

Destinations in the Data Pipeline Manager (DPM) are the endpoints where processed events are sent. You can configure the DPM to send events to one or multiple destinations, such as AWS S3, Kafka, Vector, and others.

The method of processing and transmitting events to each destination varies based on the downstream service:

  • AWS S3 Destination: Buffers all processed events and flushes them in batches, which helps optimize data storage and transfer efficiency.

  • Socket Destination: Streams individual events in real-time, enabling immediate processing.

Understanding the characteristics of each destination allows you to configure your pipelines effectively, ensuring that data is managed and transferred according to your specific needs.

Last updated