DeltaFlow just changed the CDC conversation for Fabric Spark teams

DeltaFlow just changed the CDC conversation for Fabric Spark teams

A row changes in your operational database. That should be useful in seconds. Too often it turns into a side quest.

Raw CDC feeds are ugly. Debezium envelopes. Nested payloads. Schema drift. Then Spark teams spend their time turning change events back into tables. It is expensive work, and most of it is drudgery.

DeltaFlow is Fabric’s shot at removing that drudgery.

Microsoft’s docs and March 2026 blog posts describe DeltaFlow as a preview capability in Fabric Eventstreams. It takes raw Debezium CDC events and reshapes them into analytics-ready streams that mirror the source table structure. The stream keeps the source columns and adds metadata like change type and timestamps. Eventstreams handles schema registration, destination table management, and schema evolution. You turn it on by choosing “Analytics-ready events & auto-updated schema” during connector setup.

That is the part Spark teams should care about. Less time parsing CDC envelopes. More time writing logic that matters.

What is actually supported

Do not assume every CDC connector gets this.

The Eventstreams overview and connector docs tie DeltaFlow preview to four sources:
– Azure SQL Database CDC
– Azure SQL Managed Instance CDC
– SQL Server on VM DB CDC
– PostgreSQL Database CDC

The same overview lists MySQL Database CDC, MongoDB CDC, and Azure Cosmos DB CDC as Eventstreams connectors too. They are connectors. They are not called out with DeltaFlow preview support. If your estate runs on those systems, the old cleanup work does not disappear.

Why this changes the Spark path

Eventstreams now also has a Spark Notebook destination in preview. The destination can route Eventstream data directly into a Spark notebook and start a Spark Structured Streaming job.

That shortens the path.

Instead of dragging raw CDC into Spark and cleaning it up there, you can test a pipeline where Eventstreams does the CDC shaping first and Spark starts with data that already looks like tables. The payback is simple. Spark can spend its budget on joins, enrichment, aggregations, and writes instead of JSON surgery.

There is a second benefit. Microsoft says the DeltaFlow output is meant for straightforward analytics queries, including KQL. That matters because the same stream can feed a Spark notebook and other real-time consumers without forcing every downstream system to learn Debezium semantics.

The catch

This is preview. Act like it.

Preview is where features meet unpleasant reality: weird schemas, bad timing, broken assumptions, and the database that nobody documented properly. DeltaFlow may still be the right direction. It is not a blind cutover candidate.

Run it beside your current CDC path. Compare outputs. Change a source table. Watch what happens. Kill and restart the notebook path. See where the edges are before you let production depend on it.

Also, source coverage is still narrow. Mixed database estates are going to run split architectures for a while. DeltaFlow on the supported sources. Existing CDC plumbing everywhere else.

PostgreSQL teams have homework

The PostgreSQL connector doc is specific.

To enable CDC for PostgreSQL in this flow, Microsoft says you need:
– wal_level set to logical
– max_worker_processes set to at least 16
– a server restart after those changes
– replication permissions for the connecting admin user or table owner user

There is also a networking constraint. The database must be publicly accessible unless you use Eventstream connector virtual network injection. Miss that detail and your migration plan turns into a late-night fight with networking.

What to do now

Keep the rollout small and brutal.

Start with one supported source. Enable DeltaFlow. Pick “Analytics-ready events & auto-updated schema.” Route it to a Spark notebook destination. Then measure three things:
– How much parsing code vanished
– How much schema handling vanished
– How stable the preview behavior is under source changes

One more signal is worth noticing. In the same March 2026 feature summary, Microsoft listed the Eventstream SQL Operator as generally available. DeltaFlow itself is still preview, but the Eventstreams surface around it is getting more serious.

That is the moment to test. Not later, when everyone suddenly wants it in production at once.

Bottom line

DeltaFlow matters because it attacks the worst part of CDC work. Not the business logic. The plumbing.

For supported sources, that is real leverage for Fabric Spark teams. For unsupported sources, nothing changes yet.

So do the sensible thing. Test it early. Keep your current pipeline until the preview earns trust. Then decide whether DeltaFlow gets promoted from experiment to foundation.

This post was written with help from anthropic/claude-opus-4-6

Leave a comment