ETL Testing for Modern Data Engineering: A Shift-Left SDET Approach

Authors

  • Chiranjeevulu Reddy Kasaram Independent Researcher, USA Author

DOI:

https://doi.org/10.15662/IJRAI.2024.0706028

Keywords:

ETL testing, Python, data validation, automation, data quality

Abstract

The transformation of the quality assurance role into that of the Software Development Engineer in Test (SDET) has redefined expectations for modern data engineering workflows. In data-driven systems, user interfaces represent only the surface layer of complex data pipelines, making Extract-Transform-Load (ETL) testing a critical component of quality assurance. This paper explores Python-powered ETL testing pipelines as a foundation for automating data validation within the SDET workflow. Emphasizing a shift-left testing philosophy, we propose early and continuous validation of transformation logic, data models, and full ETL processes to ensure completeness, consistency, and timeliness of data. By leveraging Python, advanced SQL, and CI/CD integration, SDETs can design reusable, scalable, and maintainable validation systems that uphold data governance principles and provide sustained data integrity across analytical and operational contexts.

References

[1] S. Srinivasan, ETL Testing & Data Warehouse Testing: A Complete Guide. Birmingham, UK: Packt Publishing, 2018.

[2] C. S. Adorf, P. M. Dodd, V. Ramasubramani, and S. C. Glotzer, “Simple data and workflow management with the signac framework,” Computational Materials Science, vol. 146, pp. 220–229, 2018, doi: 10.1016/j.commatsci.2018.01.035.

[3] D. W. Hodges and K. Schlottmann, “Reporting from the archives: Better archival migration outcomes with Python and the Google Sheets API,” Code4Lib Journal, no. 46, 2019. [Online]. Available:

[4] J. Morris, C. McCubbin, and R. Page, Hands-On Data Science with the Command Line: Automate Everyday Data Science Tasks Using Command-Line Tools. Birmingham, UK: Packt Publishing Ltd., 2019.

[5] C. Avramidis, “Development of decision support web application,” M.S. thesis, Dept. Comput. Sci., Univ. Of Thessaly, Volos, Greece, 2022.

[6] W. McKinney, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Ipython, 2nd ed. Sebastopol, CA, USA: O’Reilly Media, 2017.

[7] J. Bauer and B. Dinter, “Automated data quality monitoring: a step towards data-driven decision making,” in Proc. Int. Conf. Information Systems (ICIS), San Francisco, CA, USA, 2018, pp. 1-9.

[8] D. Vesset, Data Integrity: A Guide for Data Governance. Framingham, MA, USA: IDC, 2016.

[9] E. Ras and J. Van der Meiden, “Agile data warehouse design: Testing in an agile environment,” in Agile Data Warehousing, Business Intelligence, and Analytics, Redwood City, CA, USA: 2013.

[10] C. Kaner, J. Bach, and B. Pettichord, Lessons Learned in Software Testing: A Developer’s Guide to Becoming a Quality-Assurance Professional. New York, NY, USA: John Wiley & Sons, 2001.

[11] D. L. Olson, Data Quality: The Accuracy of Business Data. New York, NY, USA: McGraw-Hill Education, 2003.

Downloads

Published

2024-12-11

How to Cite

ETL Testing for Modern Data Engineering: A Shift-Left SDET Approach. (2024). International Journal of Research and Applied Innovations, 7(6), 11829-11834. https://doi.org/10.15662/IJRAI.2024.0706028