Platform Migration is hard – Data Migration is even harder

p.s., I originally wrote this blog here - cross-posting to my main blog for visibility.

In my last blog, I talked about application rewrites. In this post, I would like to focus on Platform rewrite, which is a flavor of application rewrite. The motivation for 'Platform rewrite' could be many. This might include:

  • A decision to switch from monolithic architecture to micro-services. For this post, I consider this as a Platform Migration.
  • This could include a decision to change the storage architecture. Like a decision to migrate from an RDBMS store to a NoSQL store. That, in my opinion, is an example of Platform rewrite that includes Data Migration.

Let's look at what is involved in both these scenarios.

Platform Migration

If we are able to reproduce the current snapshot of the API surface and its behavior in the target system, then we can say that the bulk of platform migration is done. We can rerun integration tests after wiring in the new system to ensure correctness.

Data Migration

As a system matures, its API and business logic evolve. For business logic migration to be considered complete, we have to faithfully capture the latest snapshot of this system. The storage system is where all these historical changes still exists. Think of this like layers of sediments at the bottom of a lake. The data migration task has to deal with all these layered sediments.

This would mean that, data-migration is where the team will spend the most amount of time. Let me explain this is detail.

When we attempt a data migration, a significant portion of it will succeed. But a significant subset will fail. The team will have to analyze the failures, fix current code to accommodate for that and run the migration again. This is repeated many times till all data is migrated.

The best way to think of it is to imagine dropping a bouncing ball. The distance it travels is not the height of the ball when it was dropped. It should also include the sum of all the bounces that happened before the ball comes to total rest.

Why is this significant?

The hidden cost of data migration can cause us to significantly underestimate the time required to switch to the new system. During the transition period - new features should be released to both the classic system and the new system simultaneously.

Biggest Architectural concern

To me, architecture is about modeling a system such that, it continues to remain malleable to change. In this view, 'architectural concerns' is that thing that is the most difficult to change. It is common to see architectural concerns shift through the lifetime of the application. 'Tight coupling' and hard to change 'external dependencies' are early stage architectural concerns.

Most successful enterprise applications, over time collect a lot of data. It can grow to a point that it becomes the most difficult thing to change. So, over time, the size of the data becomes the biggest architectural concern. This is reflected in the fact that any significant change to that (migration for example) is really hard to do.

Hierarchy of concerns

In our team, we keep the above point in mind when we go about making 'architectural' decisions. Most applications can easily go through a UI refresh - and it should take relatively small amount of time to finish. Look at the pace of change in front-end development. Frameworks and libraries change very often. Business logic layer is tougher to change. Nothing to be trivialized - but doable in reasonable amount of time.

For a system that has grown to be relatively big, data migration can take the longest amount of time. Keep this in mind when we take decisions. Think about your data very carefully. When we start off a new project, change in data structure is relatively easy. Not so, as data becomes bigger. And this realization should influence our hierarchy of concerns.