All articles

5 Requirements of a Data Migration Repository

5 Requirements of a Data Migration Repository
A central data migration repository delivers enormous benefits when built correctly. Explore five critical features your repository must have.
Tagged in:
Steve Novak
Steve
Novak
Vice President
View bio

The beneficial impact of a central repository and process is enormous. However, in order to realize those benefits, there are important features that the repository must have. Over the years, we have honed and continue to optimize our data repository that our data migration software, Applaud®, uses. If Definian is not part of your implementation, the following features should be part of the repository that is used to facilitate the data migration. If Definian is performing the migration, Applaud makes use of all of these.

1. Quickly replicate the legacy and target table structures

The data repository should be able to automatically create the legacy and target data structures without a DBA or SQL scripts. Throughout a project, data sources are discovered and require immediate access. If it takes more than a minute to set up the meta-data and bring in multiple data sources, project timelines will be impacted.

2. Easily create additional/alter table structures that reside only in the repository

For the repository to serve as an easy to use sandbox, the creation of additional tables and columns that only exist in the repository needs to be simple. Specifications, data enhancement/cleansing spreadsheets, and cross reference information will be constantly changing throughout the course of the project. To react to the constant requests, data structures need to be created, dropped, and altered on the fly.

3. Gracefully handle bad data

Data migration projects are all about handling both bad and good data... but mostly bad. If the migration processes can’t easily handle bad or invalid data, it’s going to be difficult for it to be successful. The repository should gracefully handle character data in numeric fields, invalid dates, etc. without losing rows. If rows are lost upon insert into the repository, the integrity of the data is lost, muddying further data analysis.

4. Beyond simple to get data in and out

Moving data from environment to environment is the core procedure of a data migration. To maximize effectiveness, facilitating the movement of data should be accomplished with minimal effort. When possible, a direct database connection should be used. In some legacy environments, mainframe especially, that’s not always possible.

However, bringing in/exporting out flat, EBCDIC, or any other file format should be a simple process. In the case of mainframe, being able to natively handle EBCDIC, packed numerics, repeating segments, etc is an immeasurable risk reducer, that several of our engagements could not have been successful without those capabilities. Whatever the format, there could be 1000s of tables/files and if it takes more than a small amount of time to get at that data, it could quickly devolve into an unmanageable process. (Read how we successfully handled over 3000 data sources)

5. Fast to build components that analyze the data once it’s in the repository

It is incredibly important that the repository is a place that’s easy to query, combine, harmonize, and separate out data within a single system and across the data landscape. If the repository doesn’t easily facilitate this cross-system analysis, the effectiveness of the repository is diminished.

In addition to a centralized data repository, there are many other techniques and processes involved in the data migration process that further reduce risk on ERP, PLM, and other complex implementations. If you have any questions regarding data migration processes, data issues, and ways to reduce data migration risk, email me at steve.novak@definian.com or call me at 773.549.6945.

Other articles

Foundation First: The Root Cause and the Path Forward

Foundation First: The Root Cause and the Path Forward

Data Governance
Best Practices
Data Value Realization
Part 2 of The Three Failures That Will Define Who Survives AI. Why treating data as a technology concern instead of its own strategic pillar is the root cause, and what Foundation First looks like in practice.
The Three Failures That Will Define Who Survives AI

The Three Failures That Will Define Who Survives AI

Data Governance
Best Practices
Data Value Realization
Over 80% of AI projects fail to reach production. The problem is not the technology. Three predictable failure modes are turning enterprise AI into the most expensive technology failure in corporate history.
The Model Isn’t the Problem

The Model Isn’t the Problem

Data Governance
Best Practices
Healthcare AI pilots stall before reaching production. The model is rarely the issue. The gap between training data and production data is what breaks deployment.
Client testimonial
The Definian team was great to work with. Professional, accommodating, organized, knowledgeable ... We could not have been as successful without you.
Senior Manager | Top Four Global Consulting Firm

Partners & Certifications

Ready to unleash the value in your data?