Published: 2026-04-10

ETL Pipeline with DTO Normalization for IPOS Data Integration in Spring Boot

DOI: 10.35870/ijsecs.v6i1.6850

No Cover Available
Article Metrics
Share:

Abstract

IPOS point-of-sale software, widely used by Indonesian small and medium retail enterprises (UMKM), exports transaction data as Excel files with no enforced schema—producing format-variable, multi-row receipt blocks with heterogeneous date representations, locale-dependent numeric formats, and embedded unit strings that resist conventional relational import. Transforming these unstructured exports into a relational database requires a structured architectural approach capable of handling format variability, type inconsistency, and record duplication. This study designs and implements a Spring Boot-based ETL (Extract, Transform, Load) service that applies the Data Transfer Object (DTO) pattern through ten purpose-specific DTO classes covering each pipeline phase, structured within a four-layer Model-View-Controller (MVC) architecture (Controller-Service-Repository-Entity). The Extractor employs a streaming Excel reader with dynamic column-layout detection based on header keywords, producing raw String-typed ExtractedReceipt and ExtractedItem DTOs. The Transformer applies six normalization steps via four utility classes—StringNormalizer, DateParser (seven date-format patterns), NumberParser (Indonesian and Western currency formats), and a HashSet-based duplicate detector—converting raw strings into typed ValidatedReceipt and ValidatedItem DTOs with explicit error logging. The Loader performs batch inserts per 1,000 records using pre-loaded duplicate sets for O(1) lookup. The pipeline operates asynchronously, returning a jobId immediately while processing continues on a background thread. Functional evaluation across ten scenarios yielded a 100% pass rate, covering valid files, invalid file types, date-format heterogeneity, embedded-unit quantity strings, Indonesian numeric formats, cross-file and intra-file duplicate detection, grand-total reconciliation tolerance, and product-variation tracking. Performance observation shows that files of 200–500 receipts complete within 5–15 seconds. These results indicate that a DTO-centric, explicitly mapped ETL pipeline over Spring Boot MVC provides a maintainable, auditable, and production-ready solution for UMKM retail data integration.

Keywords

Data Transfer Object; ETL Pipeline; Spring Boot; MVC Architecture; Data Normalization; IPOS; Batch Processing; Async Processing

Peer Review Process

This article has undergone a double-blind peer review process to ensure quality and impartiality.

Indexing Information

Discover where this journal is indexed at our indexing page.

Open Science Badges

This journal supports transparency in research and encourages authors to meet criteria for Open Science Badges.

Most read articles by the same author(s)

More From The Same Author