Accelerating AI Model Training with Synthetic Data

By Anupriyo Chakravarti, CTO & CPO at Verisma

March 25, 2026

Healthcare AI has a trust problem.

Health system leaders know AI can reduce costs, improve compliance, and streamline operations. What stops most organizations is the cost of a misstep: training on sensitive patient data, governance gaps creating regulatory exposure, and vendors who can’t explain what’s inside the black box.

At Verisma, we decided early to build AI the hard way. The right way.

Why synthetic data changes everything

We made a non-negotiable commitment: we never use client data, including PHI, to train our AI models.

So, we found a better answer: synthetic data.

Ranjit Kohli put it well in his article “16 Billion — Data Everywhere: Synthetic, Good or Bad?“: synthetic data is like synthetic oil, purpose-built. It mirrors real-world patterns while protecting sensitive information. He also made a point that stuck: real-world data isn’t always available. Synthetic data fills that gap. It got me thinking: why not apply the same approach in healthcare?

We started using Gretel Synthetics to generate medically realistic records – diagnosis codes, drug references, sensitive condition flags, and anomalies. And we never touch real patient data to do it.

How it works in practice

Our engineering and data science teams developed the QA Intelligence model training and testing methodology around three principles:

Start with context, not records. We trained the model on sentence-level patterns from medical language – teaching it what sensitive information looks like in context, without using full records.
Generate at scale. Gretel Synthetics produces privacy-safe synthetic documents matching real clinical formats, including the edge cases our models need to learn from.
Test the edges. Positive and negative test cases – scenarios where the model should and shouldn’t flag something – are all synthetic, reproducible and auditable.

Here’s what makes synthetic data particularly powerful: it fills gaps real data can’t. Need examples of rare events that may never show up in a real dataset? Build them. In healthcare, that means edge cases our models must recognize: sensitive conditions, unusual document structures, ambiguous clinical language – all generated on demand.

The result: a training process we can defend to any client IT or security team.

4 pillars of Verisma’s synthetic data approach

Privacy-first. Models trained exclusively on synthetic and public data. No PHI. No client data. Ever.
Clinical realism. Synthetic records modeled on real clinical formats, with diagnosis codes, drug references, real-world anomalies, and sensitive condition patterns.
Rigorous validation. Edge cases generated on demand, including scenarios that don’t exist in the real world, for thorough model testing.
Auditable by design. Every training and testing artifact can be traced, documented and reviewed. That’s a standard real-world data cannot meet.

This is what responsible AI looks like

At CHIME25, I led a focus group on how digital leaders are approaching AI governance. The pattern was clear: most organizations see AI’s potential, but few have built the structure to capture it safely.

Synthetic data is a direct answer to that gap. It lets you move fast without cutting corners on privacy, test thoroughly without regulatory exposure, and give clients something most AI vendors can’t: a clear, auditable record of how the model learned.

The broader industry is heading in this direction. NVIDIA’s synthetic data generation framework for agentic AI tackles the same challenges we faced: scarce data, sensitivity constraints, and the high cost of manual labeling. Synthetic data solves all three by generating diverse, domain-specific datasets at scale. In healthcare, where real data is valuable and tightly regulated. That’s not just a technical advantage, it’s a compliance requirement.

Most AI projects in healthcare stall on data access – waiting for approvals, de-identification work, and legal agreements. Synthetic data removes that constraint. Our teams can generate thousands of realistic test scenarios, including edge cases that may never appear in the real world. That speeds up development, improves model quality, and keeps compliance built in from day one.

Our QA Intelligence models are trained, tested and validated entirely on synthetic data – and they perform with the reliability healthcare demands. You don’t have to choose between moving fast and staying compliant.

Let’s move the industry forward together

Accelerating AI in healthcare without sacrificing reliability, compliance, or patient trust is an industry-wide challenge and requires industry-wide collaboration.

We’re happy to share what we’ve learned: the methodology, the tools, the lessons from testing at scale, and the governance framework making it all defensible. Whether through conference sessions, peer roundtables, or direct conversations with technology leaders, Verisma is committed to helping the industry move forward.

If you’re working through these challenges at your organization, let’s talk.

Anupriyo Chakravarti is CTO & CPO at Verisma, leading technology strategy and product development for healthcare’s leading health information lifecycle platform. He speaks regularly on AI governance, healthcare data transformation, and technology leadership at leading healthcare technology conferences and industry associations.