This is the third post in a five-part series based on our ebook, From Chaos to Clarity: The Strategic Guide to Healthcare Data Catalogs. Each post addresses one of the root causes organizations encounter when data becomes a blocker instead of an asset. Download the full ebook here.
Hospital readmissions are expensive, clinically damaging, and increasingly penalized by CMS. When a health system decides to address this, it builds a model to flag high-risk patients before discharge, giving care teams time to intervene.
Data science teams approach this work carefully, using historical EHR data, feature engineering, and algorithm tuning. By the time the model is ready, it performs well in validation, and leadership gives approval.
However, challenges emerge when teams attempt deployment.
The social determinants data that improved the model’s predictions are not accessible in real time. Lab results refresh on a schedule that the model cannot use. Three of the 40 input features behave differently in live systems than in historical data. The project stalls, and teams must explain why a model that worked in testing fails in production.
The model is rarely questioned, and it should not be. The model is not the problem.
The assumption that breaks deployment
AI teams build on retrospective data. Most healthcare organizations discover that the data used for training and the data available in production are not the same.
In development, data comes from a warehouse or curated dataset, already processed and reasonably clean. In production, data must arrive from live systems on schedules the model depends on, at the quality levels it requires. When these conditions do not hold, the model fails.
Not because it was built incorrectly, but because it was built on assumptions about data that were never verified.
What you need to know before building
Production AI has requirements beyond the algorithm. Live data must match the training data. Each prediction must be traceable to its inputs. Drift must be detectable before it affects results.
Before development begins, map every input feature against what production systems can deliver, whether it exists in real time, its completeness, its refresh frequency, and whether it can be queried fast enough for model use. A 40-feature model requires 40 clear answers. Teams that skip this step discover issues during deployment after months of work.
What running the check looks like
Before writing any code, we evaluated every data element required by the model against production capabilities.
Thirty-five features were available in structured data with over 90% completeness, which is strong. Three features were available, but only 60–70% complete, primarily social determinants data. Two required unstructured clinical note parsing that was not available in real time.
This is not a failed project. It is a decision point.
We deployed the 35-feature model, and it performed well. We documented the limitations and planned NLP integration for a later phase. The model reached production because we understood the data it relied on.
Without this assessment, gaps in social determinants data would have led to a decline in accuracy after months of work. The unstructured features would have caused a production failure shortly before deployment. The model would have been blamed for a data issue.
Before you build the next model
Most organizations lack a clear, documented view of their production data, including quality metrics, refresh frequency, completeness, and lineage, until problems arise. A data catalog provides this visibility before development begins.
Do your data scientists understand the quality of the production data they use, not just the training data? When a model underperforms, can your team determine whether the issue lies in the algorithm or the data? Can you detect drift before it becomes a performance issue?
If these questions do not have clear answers, the issue is not the model. A data catalog does not build the model, but it makes the model deployable.
The full ebook includes a 15-minute Data Catalog Readiness Assessment and a Data Chaos Cost Calculator for leadership discussions. Download From Chaos to Clarity here, or reach out for a diagnostic conversation.
Next in the Clarity Series: What it looks like when the data foundation is right from the start, and what it enables.















