Technologies used (languages, frameworks, databases, servers)
Architecture diagrams (pipelines, data flow, component structure)
Description of modules:
- Fetching module
- Mapping module
- Matching module
- Error handling, logging, etc.

Technologies Used

Programming Languages

Java

Java was selected as the core development language due to its maturity, strong performance in enterprise-scale applications, rich ecosystem, and support for multithreading—essential for building the parallelizable pipeline architecture. Its portability across environments and excellent tooling also made it a solid choice for both backend logic and integration work.
JavaScript

JavaScript was used inside the pipeline through the embedded Nashorn engine to allow dynamic scripting capabilities. This makes it possible to execute custom business logic at runtime—particularly useful for scenarios like dynamic data transformation or conditional rule execution without modifying Java code.
Shell Scripting

Shell scripts were developed for automating deployment tasks, file management, log rotation, and various DevOps-related operations. These scripts help streamline routine tasks and are especially useful in the AWS EC2 environment for managing services and cron jobs.

Jackson

A high-performance JSON processor used to handle JSON-based configuration files, API responses, and intermediate data formats. Jackson simplifies serialization and deserialization, making it easy to map JSON structures into Java objects.
AWS SDK for Java

Used to interact programmatically with AWS services including:
- EC2: For managing the compute instances hosting the pipeline.
- SQS: For implementing asynchronous, decoupled communication between processors.
- Secrets Manager: For securely storing and retrieving sensitive credentials and configurations.
FuzzyWuzzy

A Python-originated library (ported for Java) used for computing fuzzy string matching. It plays a vital role in the data matching phase, enabling intelligent linking of similar but non-identical strings (e.g., matching customer names with minor spelling differences).
OpenJDK Nashorn

A JavaScript engine bundled with OpenJDK that enables the execution of JavaScript code inside a Java application. It’s particularly useful for creating flexible and customizable processing logic that can be changed via external configuration without recompiling the system.
SLF4J (Simple Logging Facade for Java)

A logging abstraction used to provide consistent logging APIs while allowing the actual logging backend (e.g., Logback or Log4j) to be swapped without changing the code. This helps in debugging, monitoring, and maintaining clean code.

PostgreSQL

Used as the primary relational database for storing system metadata, pipeline configuration, processing logs, and intermediate results. PostgreSQL was selected for its reliability, open-source nature, and powerful querying capabilities.
Microsoft SQL Server (MSSQL)

One of the external data sources from which commission-related data is fetched. The pipeline connects via JDBC to extract data, which is then mapped and matched against Reveer’s internal representations.