To address the operational challenges of Reveer, we created and put in place an extensible and modular pipeline system. The fundamental purpose of this system is to power the crucial stages of Reveer's commission management workflows, i.e., reading data from external sources, mapping data to process and align incoming information against Reveer's internal data structures, and matching data to consolidate and match records intelligently between different sources. By automating such processes, the pipeline is made to significantly reduce manual intervention, reduce human error, accelerate processing, and increase overall data accuracy. Additionally, the modular nature of the systems makes it scalable and easy to configure to address evolving business needs, increased data sets, and more complex commission structures as Reveer grows.
The pipeline system is built around a staged and parallelizable architecture, designed to maximize flexibility, performance, and modularity.
Each pipeline is composed of multiple stages, where:
Processors can communicate and coordinate with each other through three main mechanisms:
Also called SurehubPipeline or SurehubJob, a pipeline is the highest-level structure that defines the overall workflow. It organizes the execution of multiple stages and coordinates the flow of data and control between them. Each pipeline is configured by a JSON file, enabling dynamic adjustments to be made to the processing logic without modifying the code.
A stage is a logical component within a pipeline. It represents a group of processors that perform related operations. Stages can be executed either sequentially or in parallel, depending on their dependencies. Each stage serves as a checkpoint in the pipeline, controlling when and how processors are triggered.
A processor is the smallest executable unit within the pipeline architecture. Each processor is responsible for a specific task, such as fetching data, transforming records, applying business rules, or performing matching operations. Processors within a stage run in parallel to maximize performance and throughput.