Skip to main content

Rethinking Financial Data Engineering: From Batch Systems to AI-Native Architectures

Abstract

This paper examines the transformation of financial data engineering from traditional batch-oriented architectures to real-time, AI-native data ecosystems. It identifies key limitations in legacy systems, including latency, fragility, and lack of scalable data quality frameworks, and presents architectural principles derived from large-scale financial platforms

The study introduces a governance-centric framework integrating real-time data streaming, automated data lineage, and Data Quality as a Service (DQaaS), supported by production-scale implementations processing over 500,000 GB daily and enabling over one million analytical queries

Additionally, the paper explores the convergence of data engineering and artificial intelligence, demonstrating how AI-ready pipelines support fraud detection, regulatory compliance, and predictive analytics. The findings provide both theoretical insights and practical frameworks for designing next-generation financial data systems

References

[1] M. Das, X. Tao, and J. C. Cheng, “A secure and distributed construction document management system using blockchain,” International Conference on Computing in Civil and Building Engineering, Springer, 2020, pp. 850–862.
[2] Z. Peng, H. Wu, B. Xiao, and S. Guo, “VQL: Providing query efficiency and data authenticity in blockchain systems,” IEEE International Conference on Data Engineering Workshops (ICDEW), 2019.
[3] C. Pradhan, “Data Engineering for Scalable Machine Learning Pipelines,” 2024.
[4] C. Liu, L. Zhu, and J. Chen, “Graph encryption for top-k nearest keyword search queries on cloud,” IEEE Transactions on Sustainable Computing, vol. 2, no. 4, pp. 371–381, 2017.
[5] C. Wang, C. Gill, and C. Lu, “Adaptive data replication in real-time edge computing for IoT,” IEEE/ACM International Conference on Internet-of-Things Design and Implementation (IoTDI), 2020, pp. 128–134.
[6] I. Chenchev, “Framework for multi-factor authentication with dynamically generated passwords,” Lecture Notes in Networks and Systems, Springer, 2023.
[7] D. Chen and H. Zhao, “Data security and privacy protection in cloud computing,” International Conference on Computer Science and Electronics Engineering, 2012.
[8] C. Pradhan, “Integration of Blockchain Technology in Secure Data Engineering Workflows,” 2024.
[9] D. Xu et al., “Virtualization of encryption card for trust access in cloud computing,” IEEE Access, vol. 5, 2017.
[10] C. Pradhan, “Automated Data Lineage Tracking in Data Engineering Ecosystems,” 2024.
[11] D. Fitch and H. Xu, “RAID-based secure and fault-tolerant cloud storage model,” International Journal of Software Engineering and Knowledge Engineering, 2013.
[12] H. Cheng et al., “Privacy-preserving cloud computing based on identity-based encryption,” IEEE Access, 2018.
[13] V. Clincy and H. Shahriar, “Blockchain development platform comparison,” IEEE Annual Computer Software and Applications Conference (COMPSAC), 2019.