The importance of data validation and parsing when working with external data sources
Gallen, Alexander (2024)
Gallen, Alexander
2024
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2024041116665
https://urn.fi/URN:NBN:fi-fe2024041116665
Tiivistelmä
Working with data from external sources often revolves around combining data from multiple sources to analyse or process it in new ways to generate value. The developer is often faced with uncertainties in the retrieved data due to lacking or non-existent documentation. In this thesis the praxis of parsing, validating and transforming the data will be explored in depth to showcase how these challenges can be tackled to obtain a robust data fetching pipeline, and keep the backend free from unknowns and extra validation that is not part of the business logic that is being developed.
The focus will be on TypeScript and more specifically the Zod library. The language and technology were chosen based on their popularity, relevance in the modern programming field, and previous experience in production environments by the author. The data that is examined to showcase the benefits and a minimal setup to achieve a robust parsing and validation flow are ledger receipts in Procountor, a widely used Finnish bookkeeping system. By implementing these best practices and data fetching techniques it is possible to eliminate unknowns from the backend of the application under development. All the parsing, validation and transformation of the external data are handled in a single place of the data processing pipeline and are in their entirety extracted from the business logic of the application. These functions can also be re-used to manifest uniform practicalities throughout the application to easily scale the system to facilitate data from more integrations. Handling of common problems such as miss-matching data types and different ways to portray certain information such as “nullish” values or dates becomes trivial and uniform and the type of data that enters the backend, business logic of the application is fully known and robust.
Taking advantage of these learnings will allow developers to increase the productivity, robustness and maintainability of their applications, specifically when dealing with large or complex data from third party applications. Typical issues such as various string representations of types e.g. dates or currencies become an issue of the past and the types of all the fields can be managed uniformly in the application to allow the developers responsible for the business logic, data analysis, or data processing to focus on the problem at hand, instead of the wrangling of the unknown external data to deal with unknown edge cases.
The focus will be on TypeScript and more specifically the Zod library. The language and technology were chosen based on their popularity, relevance in the modern programming field, and previous experience in production environments by the author. The data that is examined to showcase the benefits and a minimal setup to achieve a robust parsing and validation flow are ledger receipts in Procountor, a widely used Finnish bookkeeping system. By implementing these best practices and data fetching techniques it is possible to eliminate unknowns from the backend of the application under development. All the parsing, validation and transformation of the external data are handled in a single place of the data processing pipeline and are in their entirety extracted from the business logic of the application. These functions can also be re-used to manifest uniform practicalities throughout the application to easily scale the system to facilitate data from more integrations. Handling of common problems such as miss-matching data types and different ways to portray certain information such as “nullish” values or dates becomes trivial and uniform and the type of data that enters the backend, business logic of the application is fully known and robust.
Taking advantage of these learnings will allow developers to increase the productivity, robustness and maintainability of their applications, specifically when dealing with large or complex data from third party applications. Typical issues such as various string representations of types e.g. dates or currencies become an issue of the past and the types of all the fields can be managed uniformly in the application to allow the developers responsible for the business logic, data analysis, or data processing to focus on the problem at hand, instead of the wrangling of the unknown external data to deal with unknown edge cases.