top of page
  • Writer's pictureHuxley Kendell

Navigating Reference Data Management in Databases: Streamlining Workflow and Ensuring Consistency

As anyone responsible for managing a backend database, you're likely no stranger to the intricacies of handling reference data within your database environment. Reference data forms the bedrock of your data landscape, dictating the behaviour and logic of your applications. However, managing evolving reference data efficiently can often feel like taming a constantly shifting tide.

Traditionally, incorporating reference data changes into migration scripts has been a common practice. These scripts, seamlessly integrated into deployment pipelines, help maintain consistent data modifications across different development stages. Yet, when confronted with substantial datasets, the process can become a bottleneck, elongating development cycles and hampering productivity.

To circumvent these challenges, the concept of repeatable scripts comes to the forefront. These scripts offer an incremental approach, bypassing the need for exhaustive scans of the entire database. While repeatable insert statements provide a foundational solution, they can fall short when it comes to scenarios demanding data edits.

The limitations of repeatable insert statements become evident when the need for data updates arises. These statements often require conditional checks like "if not exists," which are suitable for adding new rows but less effective for handling changes within existing data. In scenarios where existing reference data needs modification or update, these statements present hurdles that can slow down the development process.

This is where the merge statement steps in – a dynamic SQL command capable of harmonizing insert, update, and delete operations into a singular force. Its adaptability makes it an indispensable tool in the realm of reference data management. The merge statement's unique ability to introduce new data while simultaneously updating existing records presents a comprehensive strategy to effectively navigate the ever-changing nature of reference data.

However, as you delve into this approach, the thought of seamlessly integrating it into your existing workflow might arise. This is where your trusted companion, Flyway, enters the scene. Flyway doesn't disrupt; it enables. Its integration ensures that your migration scripts, including repeatable merge statements, smoothly traverse the development pipeline. The tool's proficiency in tracking script changes, coupled with its flexibility to identify modifications, optimizes your database management workflow.

In your quest for streamlined reference data management, remember that you hold the reins. The merge statement, in collaboration with tools like Flyway, empowers you to orchestrate changes effortlessly, maintain data consistency, and drive your database management endeavors towards unprecedented efficiency. Your journey to simplified reference data management is paved with innovative strategies and reliable deployment tools and solutions. So let's explore how I personally tackled the challenge of managing Reference Data alongside Flyway, finding a balance between efficient management and reaping the benefits of Flyway's seamless deployment.

The pain point I encountered is all too familiar – numerous tables with extensive Reference Data. While I acknowledge the immense advantages of migration scripts, the practicality of manually crafting these scripts for each data set change, or adopting a State-Based approach with its exhaustive scans, posed limitations. So, how could I effectively handle and evolve this data while harnessing Flyway's power?

My initial step involved transforming a table, complete with its existing data, into a merge statement. Gratefully, someone had already simplified this process with a convenient Stored Procedure! Executing this procedure against one of my tables gave birth to my foundational merge statement.

With the merge statement in hand, Flyway's custom script integration came to the rescue. Placing the merge statement as a repeatable script laid the groundwork for managing my Reference Data, this is where any editing of Ref Data will be done.

Flyway, being the Database Deployment Powerhouse it is, maintains a checksum for each script. When the script undergoes editing with fresh or revised data, Flyway marks it for deployment in the subsequent migration. If the script remains unchanged, signifying no data edits, Flyway conserves resources by refraining from redeployment. So, no new data, and we can see below Flyway is happy to not redeploy the script.

When alterations are made to the script, as observed in the image below, Flyway immediately detects the changes and schedules the deployment of the updated script. This functionality is extremely advantageous as it relieves the burden of tracking deployment requirements from the database administrators. In scenarios where frequent edits to reference data are necessary alongside application code changes, Flyway's automated responsiveness ensures that these changes are reliably deployed. This significantly reduces the risk of crucial reference data not being deployed in sync with the application code changes, providing a seamless and efficient solution.

Flyway responding to an edited reference data script – marked as pending for deployment.

This approach can certainly be manually implemented, even without Flyway's deployment functionality. Irrespective of your position on the DevOps Maturity scale, the management of Reference Data can be significantly smoother. However, for those aiming to alleviate not only Reference Data challenges but also the intricacies of Script Generation and deployment, I wholeheartedly recommend exploring the capabilities of Flyway.

For more details, you can visit the official Flyway website.

69 views0 comments


bottom of page