Microsoft Fabric – Christopher Finlan

The Security Feature That’s Actually About Speed

The Security Feature That's Actually About Speed

Here’s the thing nobody’s saying about Fabric Eventstream’s new Custom CA and mTLS support: it isn’t really a security feature. Or rather, it is, but the teams who’ll benefit most aren’t security teams. They’re Spark engineers who’ve been running shadow pipelines for months because the “secure” path was also the “impossible” path.

Let me explain.

The Workaround Tax

If you’ve been running Spark workloads against Kafka clusters in any regulated environment (banking, healthcare, telecom), you already know the drill. Your Kafka brokers sit behind certificates signed by an internal Certificate Authority. Your infosec team mandates mutual TLS. And until recently, Fabric Eventstream’s Kafka connectors only trusted the system-predefined CA list. Full stop.

So what did teams actually do? They built workarounds. They stood up intermediate proxy layers that terminated mTLS and re-encrypted with publicly trusted certs. They ran sidecar containers that handled the certificate dance outside of Eventstream. Some teams gave up on Eventstream entirely and wrote custom Spark Structured Streaming jobs that managed their own TrustStores and KeyStores. Jobs that worked, but that nobody wanted to maintain.

Every one of those workarounds carried a cost. Not just in engineering hours, but in latency. Every extra hop between your Kafka broker and your Spark processing layer adds milliseconds. In a world where Spark Structured Streaming microbatches are measured in seconds, those milliseconds compound. A proxy layer that adds 15ms per message at 50,000 messages per second means your pipeline is spending 12.5 minutes per million messages just waiting for the handshake relay. That’s not a rounding error. That’s the difference between a batch window that closes on time and one that doesn’t.

What Actually Changed

The preview announcement covers three Kafka-based Eventstream sources: Apache Kafka, Amazon Managed Streaming for Apache Kafka (MSK), and Confluent Cloud for Apache Kafka, plus Confluent Schema Registry. All of them now support two capabilities that were previously missing:

Custom CA certificates. You can import your internal CA certificate into Azure Key Vault in PEM format and reference it when configuring a Kafka source in Eventstream. The connector runtime fetches the certificate and trusts it for the TLS handshake. No more proxy layers to bridge the trust gap.

Mutual TLS (mTLS). Beyond custom CAs, you can import a client certificate and private key into the same Key Vault. Eventstream presents this client certificate during the TLS handshake, and the Kafka broker validates it. Two-way authentication without a single line of custom code.

The decision to anchor everything on Azure Key Vault solves a problem that’s plagued Spark teams for years: certificate distribution. In a traditional Spark cluster, you’d bake certificates into Docker images, mount them as secrets in Kubernetes, or distribute them through DBFS. Every rotation cycle meant redeploying or restarting jobs. With the Key Vault approach, you update the certificate once. Every Eventstream connector that references it picks up the new version automatically. No redeployment. No restart. No 3 AM pages because someone forgot to rotate the cert on the staging cluster.

The Spark Engineer’s Migration Checklist

If you’re currently running workaround pipelines, here’s the concrete path to cutting them over. Not theoretical. The sequence that will save you the most time with the fewest surprises.

Step 1: Audit your certificate chain. Before you touch Eventstream, document what you have. What CA signed your Kafka broker’s certificate? Is it a single root CA, or is there an intermediate chain? For mTLS, where does your client certificate live today, and who manages the private key? You need this inventory before anything else, because the Key Vault import requires PEM format, and many internal PKI systems export in PKCS#12 or DER.

Step 2: Set up Azure Key Vault with proper RBAC. Create or identify a Key Vault. Import your CA certificate as a certificate object, not a secret. This distinction matters because Eventstream’s certificate fetching logic expects it. If you’re using mTLS, import the client certificate and private key together as a single PEM bundle. Assign the “Key Vault Certificate User” role to the identity that Eventstream uses. For the initial import, use “Key Vault Administrator” and rotate down to least privilege afterward.

Step 3: Handle the private network case. If your Kafka brokers sit inside a VNet or on-premises network, you need Eventstream’s VNet injection. Create an Azure virtual network that can reach your Kafka brokers and also has a private endpoint to your Key Vault. The ordering matters: configure the VNet and private endpoints first, then configure the Eventstream source. If you reverse the order, the connector will fail silently during certificate fetch because it can’t reach the Key Vault through the public endpoint.

Step 4: Configure and test one source. Pick your lowest-risk Kafka topic, something with a steady, predictable message rate, and configure it as an Eventstream source with the Custom CA settings. For mTLS, enable both the trusted CA certificate and the client certificate references. Run it for 24 hours. Watch for two things: authentication errors in the Eventstream monitoring (which mean your certificate chain is incomplete or your Key Vault permissions are wrong) and message latency compared to your workaround pipeline (which should be lower, since you’ve eliminated the proxy hop).

Step 5: Migrate incrementally. Once your test source proves stable, migrate topics one at a time. Keep your workaround pipelines running in parallel until each Eventstream source has been stable for at least 48 hours under production load. When you decommission a workaround, don’t just turn it off. Remove the infrastructure. Proxy layers and sidecar containers have a way of becoming permanent fixtures if you leave them around.

What This Means for Your Spark Processing Layer

Here’s where it gets interesting for Spark engineers specifically. When Eventstream handles the mTLS connection directly, your Spark Structured Streaming jobs downstream no longer need to manage TLS configuration. The data arrives in Eventstream already authenticated and decrypted. Your Spark jobs read from Eventstream’s output—a KQL Database, a Lakehouse, or a derived event stream—without caring about the certificate logistics upstream.

This changes your Spark job’s failure domain. Before, a certificate expiration on your Kafka broker could cascade into a Spark Structured Streaming job failure that looked like a network timeout. Your on-call engineer would spend 45 minutes digging through logs before realizing it was a cert issue, not a cluster issue. With Eventstream handling the connection, certificate-related failures surface in Eventstream’s monitoring, not in your Spark job logs. The blast radius shrinks. Mean time to diagnosis drops.

There’s also a capacity planning angle. If you were running proxy layers or custom Spark Structured Streaming ingestion jobs to handle mTLS, you were burning Spark capacity on what’s essentially an I/O concern. Those compute units get freed up. On a Fabric F64 capacity, redirecting even a small percentage of compute from certificate-wrangling proxy jobs to actual analytics can measurably impact your batch completion times.

The Risks You Should Actually Worry About

This is a preview feature, and previews carry specific risks that experienced Spark engineers should plan around.

Certificate fetch latency at connector startup. The Eventstream connector fetches certificates from Key Vault at runtime. If your Key Vault has high latency (common with geo-replicated vaults under load), connector startup will be slower. This probably won’t affect steady-state streaming, but it could affect recovery time after a connector restart. Test your connector’s cold-start time under realistic Key Vault conditions.

Key Vault throttling under rotation. If you rotate certificates frequently (some compliance regimes require 90-day rotation), the moment you update a certificate in Key Vault, every connector referencing it will re-fetch. If you have dozens of Eventstream sources pointing at the same Key Vault, you could hit throttling limits during rotation. Stagger your connector restarts or use separate Key Vault instances for high-fanout scenarios.

Preview-to-GA contract changes. Microsoft may change the configuration interface, the certificate format requirements, or the Key Vault integration pattern between preview and GA. Don’t build automation that scrapes the Eventstream UI. If you need to automate connector provisioning, use the Fabric REST APIs and wrap them in a layer you can update when the API contract changes.

The Bigger Picture

What makes this feature worth paying attention to isn’t the TLS handshake mechanics. It’s the architectural shift it represents. Fabric is steadily pulling infrastructure concerns out of the Spark processing layer and into the platform layer. First it was storage with OneLake. Then compute scheduling with Fabric capacities. Now it’s connection security. Each time, the pattern repeats: something that Spark engineers used to manage through custom code and tribal knowledge becomes a platform capability with a configuration interface.

The teams that will get the most value from this aren’t the ones with the most sophisticated workarounds. They’re the ones who recognize that the workaround was always the wrong layer of abstraction and move quickly to eliminate it. The proxy layer was never the product. The data pipeline was.

Start with one topic. Prove it works. Then move fast.

This post was written with help from anthropic/claude-opus-4-6

Your Fabric Spark Jobs Have Been Failing in Silence. That Just Changed.

You schedule a notebook. You schedule a pipeline. You walk away.

That’s the deal. Set it and forget it. Except “forget it” has a dark side nobody warns you about.

When a scheduled Spark job dies at 2 AM, it dies quiet. No call. No text. No alarm. The data just stops moving. Downstream reports go stale. Dashboards freeze mid-number. And you find out Monday morning when your VP asks why the revenue figure hasn’t budged since Friday.

That silence just got a fix. Scheduled job failure notifications hit General Availability in Microsoft Fabric. Here’s what that means for Spark teams running production workloads, and what you need to do about it before the weekend.

What Actually Shipped

Fabric’s Job Scheduler now sends email notifications when a scheduled run fails. Every item type that supports scheduling is covered: Notebooks, Pipelines, Dataflows Gen2, Spark Job Definitions, and more.

Setup takes about thirty seconds. Open any schedulable item, hit Schedule, and add users or Microsoft Entra groups under Failure Notifications. One configuration covers all schedules attached to that item. No per-schedule fiddling.

Both portal-created and REST API-created schedules support notifications. Your CI/CD-deployed schedules get coverage too, as long as your deployment templates include the notification recipients.

One detail worth burning into memory: notifications fire only for scheduled runs. Manual triggers don’t generate emails. The logic is simple. Manual runs have a human watching. Scheduled runs don’t.

Why Spark Teams Should Care More Than Most

Spark workloads are uniquely punishing when they fail silently.

A failed notebook refresh doesn’t just mean one stale table. In a typical Fabric lakehouse, that notebook sits in a chain. Bronze ingestion feeds silver transformation feeds gold aggregation feeds a semantic model feeds a Power BI dashboard your CFO checks before their 9 AM meeting. One broken link at 3 AM and the entire chain is dead by sunrise.

Pipeline orchestration makes it worse. A single pipeline might call four Spark notebooks in sequence. If the second one blows up because an upstream schema changed, the whole pipeline fails. Without notifications, your only option is checking the Monitoring Hub by hand. Nobody does that proactively at scale. Nobody.

And Spark jobs fail for reasons that hide. Cluster timeouts. Memory pressure on large shuffles. Transient storage hiccups in OneLake. These don’t throw loud errors in the UI. They add quiet rows to run history. Failure notifications turn those quiet rows into inbox items you can’t ignore.

The Migration Risk Nobody’s Talking About

If you moved scheduled jobs from Azure Data Factory to Fabric, stop and read this section twice.

ADF had built-in alerting through Azure Monitor. Many teams leaned on it without ever thinking about it. It was just there. Fabric’s scheduler had no equivalent until this GA release.

That means some teams have been running production Spark workloads in Fabric for months with zero automated failure alerting. If that’s you, this announcement is a gap that’s been open since the day you migrated, finally getting closed.

Go check. Every workspace that hosts scheduled Spark notebooks and pipelines. If they came from ADF and nobody reconfigured alerting in Fabric, you’ve been flying blind. Possibly for months.

Your Rollout Checklist

Here’s what to do this week. Not next sprint. Not next quarter. This week.

1. Audit your scheduled items. Open the Monitoring Hub. Find the Schedule Failures page (still in Preview). It gives you one view of failure notifications across every scheduled item. If the list is empty, that’s bad news. It means nothing is configured yet.

2. Prioritize by blast radius. Start with the items that feed the most downstream dependencies. Gold-layer notebooks. Semantic model refreshes. Pipeline orchestration jobs. These get notifications first. A bronze ingestion notebook that nothing reads from yet can wait.

3. Use groups, not individuals. Add a Microsoft Entra security group or mail-enabled group to the notifications field. People change roles. On-call rotations shift. Group membership stays current without anyone touching every schedule by hand.

4. Cover your API-deployed schedules. If your CI/CD pipeline creates or modifies schedules through the Job Scheduler REST API, update your deployment templates. The API supports notification configuration. But templates created before this GA release almost certainly don’t include it.

5. Check permissions first. Configuring failure notifications requires at least the Contributor role in the workspace, or Write permission on the item. Viewers can see existing schedules but can’t touch notification settings. If your data engineers lack Contributor access, they can’t set this up themselves. Someone with the right role needs to do it or fix the permissions.

6. Plan for what this doesn’t cover. Failure notifications work for scheduled runs only. Event-driven Spark jobs, REST API triggers, and manual runs still need separate alerting. For pipelines, add Outlook or Teams activities on failure paths. For broader event-driven coverage, Data Activator can react to pipeline job events and trigger notifications for creation, deletion, updates, success, and failure statuses.

What This Doesn’t Do

Let’s draw the lines clearly.

This feature sends email. That’s it. No Teams messages. No webhooks. No PagerDuty. No Slack. If your incident response lives outside email, you need a bridge: a Power Automate flow triggered by the notification email, or a Data Activator rule. Either works. Both mean another piece to build and maintain.

There’s no suppression or deduplication either. A scheduled job that fails every 15 minutes generates an email every 15 minutes. For high-frequency Spark jobs, that’s inbox destruction in under an hour. Fix the root cause fast or disable the schedule while you investigate.

The notification emails include the item name, error details, run time in UTC, and a direct link to the Monitoring Hub. Useful for triage. But there’s no programmatic API to query notification history or build dashboards over failure data. For that level of observability, query run history through the REST API or use the Monitoring Hub directly.

The Bigger Picture

This GA release closes a real operational gap. For Spark teams especially, with their complex job chains, hidden failure modes, and the lakehouse architecture’s dependency graphs, silent failures aren’t just annoying. They’re dangerous.

But let’s be honest: notifications are table stakes. The minimum. If you’re running Spark workloads in Fabric at any real scale, you should also be thinking about Data Activator for event-driven alerting, the Monitoring Hub APIs for custom observability dashboards, and retry policies baked into your pipeline designs.

Failure notifications tell you something broke. Everything else in your operational stack tells you why, how often, and what to fix.

Start with the checklist. Get the emails flowing. Then build from there.

This post was written with help from anthropic/claude-opus-4-6

What “Upgrade your Synapse pipelines to Microsoft Fabric with confidence (Preview)” actually means for Fabric Spark teams in production

What "Upgrade your Synapse pipelines to Microsoft Fabric with confidence (Preview)" actually means for Fabric Spark teams in production

Preview posts are written to soothe. Production teams read them like incident reviewers. They want to know what moves, what stays off, and what still needs proof before anyone re-enables a trigger.

This new migration experience is useful because it has brakes.

It lets you assess Synapse pipelines, see compatibility gaps, migrate supported pipelines into a Fabric workspace, map Synapse linked services to Fabric connections, and keep execution under control while you validate the result. That is not a one-click estate conversion. Good. One-click migration promises are how people end up explaining themselves on a call at 6 a.m.

This is triage before it is migration

The flow is split into three stages: assessment, review, and migration.

Assessment classifies each pipeline as Ready, Needs review, Coming soon, or Unsupported / Not compatible. You can export the assessment to CSV, which is more useful than it sounds. Most Synapse estates are not clean enough to reason about from memory. The CSV gives you a working list you can sort, assign, and use in a real plan.

The categories also give you an obvious first pass:

Ready: pilot batch.
Needs review: engineering work.
Coming soon: stop thrashing and wait for support to land.
Unsupported / Not compatible: redesign it.

The docs also recommend a phased approach. Start with Ready. Fix Needs review. Rerun the assessment. Sensible advice, which means some teams will try very hard to ignore it.

The Spark-specific catch is the part people will miss

If a Synapse pipeline calls Notebook activities or Spark job definition activities, Microsoft says to migrate those Spark artifacts to Fabric first.

That is the whole game for Spark teams.

If the matching Fabric notebooks or Spark job definitions already exist, the migration flow can map those activities to the Fabric items. If they do not exist yet, those activities may stay unmapped or deactivated until you create the Fabric items and update the references.

So a migrated pipeline is not automatically a runnable Spark workload. It may be a correctly copied orchestration layer that still points to nowhere useful. If your team blurs that line, you are not “almost done.” You are halfway to a very dumb cutover.

Connection mapping is where “migrated” stops meaning “ready”

The migration flow then asks you to pick a Fabric workspace and map Synapse linked services to Fabric connections.

Here the product does something smart. It does not force fake completeness. Pipelines can migrate even if every connection is not mapped. The catch is explicit: activities that use unmapped connections remain deactivated.

That is the right tradeoff. A deactivated activity is annoying. A silently broken run is worse.

This is where the human work starts:

make sure the right Fabric connections exist
validate credentials and access
check which activities are still deactivated
confirm notebook and Spark job references point to the intended Fabric items

The tool can move metadata. It cannot tell you whether your team has actually finished the migration.

“Triggers disabled by default” is the best sentence in the whole thing

After migration, triggers are disabled by default.

Perfect.

That removes one of the most common migration failure modes: an artifact gets copied, a dependency gets missed, the schedule fires anyway, and now production is teaching everyone a lesson. Keeping triggers off buys you a clean validation window.

The post-migration guidance is refreshingly sane:

Validate connections and credentials.
Re-enable and configure triggers as needed.
Run end-to-end tests.
Validate in a nonproduction environment before switching production workloads.

That is the order. Not the other way around.

There is one smaller operational detail worth noting. Migrated pipelines appear in the Fabric workspace with the source factory name prefixed. That helps when you are reviewing a mixed estate and trying to keep lineage straight.

What this preview changes

It does not finish the migration for you. It does make the early part less chaotic.

You get a readiness assessment instead of guesswork. You get a phased path instead of a big-bang leap. You get visible connection mapping. You get deactivated activities when dependencies are missing. You get triggers held back until you choose to turn them on.

That is real value. It turns migration from “hope plus calendar pressure” into something you can audit.

A rollout pattern worth trusting

If I were running this for a production Fabric Spark estate, I would keep it brutally simple.

Migrate notebooks and Spark job definitions to Fabric first.
Run the pipeline assessment and export the CSV.
Start with Ready pipelines that already have their Fabric Spark counterparts in place.
Map linked services to Fabric connections and treat every deactivated activity as unfinished work.
Run end-to-end tests in nonproduction. Compare outputs, parameters, logging, and failure handling.
Re-enable triggers only after the pipeline and its Spark dependencies survive contact with reality.
Then work through the Needs review backlog and rerun assessment as you clear items.

It is not glamorous. It is how you keep a migration from turning into a weekly apology.

The practical takeaway

This preview matters because it is honest about the order of operations.

For Spark-heavy Synapse estates, the job is not “move everything to Fabric.” The job is “move Spark artifacts first, move orchestration second, validate connections and behavior, then turn execution back on.” The new experience supports that sequence instead of pretending the sequence does not matter.

So no, this is not a teleportation device for legacy pipelines. It is a staging area with guardrails. For teams running Spark in production, that is much more useful.

This post was written with help from anthropic/claude-opus-4-6

Fabric notebook resources in Git give Spark teams a real release boundary

Most Spark notebook trouble is not caused by the notebook. It is caused by the little files loitering around it.

A notebook runs fine in dev. In test, it quietly depends on a config file somebody copied by hand. In prod, a helper script lives in a workspace folder nobody remembers creating. Then the team spends an afternoon acting baffled. This happens so often it barely even feels embarrassing anymore.

That is why Resources folder support in Git for Fabric notebooks matters. The official notebook source control and deployment docs now describe notebook resources living with the notebook in source control. Pair that with Fabric Git integration, which syncs workspace items with a Git repository for version control and collaboration, and the point lands pretty hard: Fabric is getting better at treating notebooks like real engineering artifacts instead of magical documents that somehow run businesses by optimism alone.

What changed

In plain English, supporting notebook files can now sit with the notebook in Git-backed workflows instead of floating around just outside the process.

That sounds small. It is not.

Once those files live in the repo, they show up in normal history and pull-request diffs. That is one of the verified, useful bits here. A reviewer can see the notebook change and the supporting-file change in the same place, at the same time, during the same discussion. For teams that have ever been burned by a side file nobody reviewed, that is not cosmetic. It is the whole game.

Why Spark teams should care

Spark notebook work grows side files the way garages grow mystery boxes. At first it is one harmless config. Then a helper module. Then a mapping file. Then a test fixture that was supposed to be temporary around the same time fax machines were supposed to be temporary.

The notebook stays visible. The dependencies do not.

That gap is where trouble starts. If the notebook looks versioned but the files it depends on are living somewhere else, you have a comforting illusion, not a reliable workflow.

Resources in Git makes the workflow more honest. The notebook is still the star of the show, but the supporting cast is finally on stage where people can see it.

What gets better in practice

I would not pretend this feature solves deployment. It does not. Git has never once saved a team from bad habits.

What it does do is remove one very stupid source of drift.

A few practical wins follow from that:

Review gets better because the supporting files are visible in the same repo flow.
Collaboration gets better because Fabric Git integration already uses the repository as the shared system of record.
Change history gets better because notebook-supporting files now live in normal repository history instead of in folklore, Slack messages, or somebody’s Downloads folder.

None of that is glamorous. Good. Glamour is overrated. Inspectable systems are better.

What to do with the feature this week

Use it to tighten the contract around a production notebook. Do not use it as an excuse to create a beautifully organized junk drawer.

Be picky about what belongs in Resources

If a file materially affects how the notebook runs, how the team validates it, or how someone maintains it at 2 a.m., it is probably a candidate.

That usually includes things like config templates, helper code tied to the notebook, schema or mapping files, small test fixtures, and operating notes.

It does not include secrets, giant raw datasets, or random leftovers that nobody can defend with a straight face.

Review the file changes, not just the notebook

This is the part teams will skip if nobody insists on it.

If a notebook change also touches supporting files, review those files with the same seriousness. The feature is only useful if the newly visible files actually get looked at.

Standardize the layout before entropy arrives in a nice shirt

Pick a predictable folder pattern and keep it boring. Boring wins. Boring lets an on-call engineer guess where the relevant files live without a spiritual journey.

Test one real notebook, not a toy example

Take a notebook that already depends on side files. Move the right files into the repo-backed structure. Review the diff. Run it through the Git workflow your team already uses. Then turn that into the standard.

Anything else is just a product demo performed for yourselves.

The traps

This feature will make some teams better. It will also reveal which teams have been winging it.

The first trap is duplicate truth. If the same file or logic exists in three places, Git is not going to sort out the argument for you. Pick the real source of truth and cut over cleanly.

The second trap is false confidence. A file being in Git does not magically make your release process good. It just means the file is now versioned, which is a solid start and not a miracle.

The third trap is resource sprawl. Give engineers a new folder and, sooner or later, they will try to store civilization in it. Set limits early.

The real value

This is not a flashy announcement. It is a practical one. Those are usually the announcements worth paying attention to.

Fabric notebooks now have Resources folder support in Git. Fabric Git integration already syncs workspace items with repositories for version control and collaboration. And once notebook-supporting files live there too, their changes show up in repository history and pull-request diffs.

That is enough to matter.

For production Spark teams, the payoff is simple: fewer hidden dependencies, less hand-carried nonsense, and a notebook workflow that looks a little more like software engineering and a little less like amateur archaeology.

Official docs worth keeping open

This post was written with help from anthropic/claude-opus-4-6

What the February 2026 gateway release really means for Fabric Spark teams

Monthly gateway release posts are usually the corporate equivalent of dry toast. A version number appears. Power BI Desktop compatibility gets a polite bow. Then everyone goes back to moving data and arguing with refresh logs.

The February 2026 on-premises data gateway release is mostly that kind of update. Microsoft says the build is 3000.306, and the point is simple: keep the gateway aligned with the February 2026 Power BI Desktop release so reports refreshed through the gateway use the same query execution logic and runtime as Desktop.

Useful? Yes. Dramatic? Not even a little.

What makes this release worth a Spark team’s time is everything happening around it. In the last few months, Microsoft added manual gateway updates, shipped pipeline performance work in January, and expanded managed private endpoint guidance for Fabric Data Engineering workloads. Put together, those changes tell a clearer story than the February post does on its own: the gateway still matters, but it is no longer background plumbing you patch whenever someone remembers.

The February release itself is small

The official February announcement is short and very Power BI flavored. Version 3000.306 brings the gateway up to date with the February 2026 Power BI Desktop release. That matters if your Spark world touches gateway-mediated refresh or movement of data through Fabric services that depend on the gateway.

If your team uses Spark notebooks or Spark job definitions alongside pipelines, semantic models, or refresh paths that still run through the on-premises data gateway, version alignment is not glamorous, but it is part of keeping production boring. And boring is what you want from production. “Interesting” is how incident reviews begin.

There is also an awkward timing detail here. The Microsoft Learn page for supported gateway versions already lists March 2026, build 3000.310, as the latest supported update. So if you are making an upgrade decision today, the practical move is not to cling to 3000.306 out of loyalty to February. The real lesson from February is that the monthly update train keeps moving, and Spark teams need an operating habit for that cadence.

December changed the maintenance story

The bigger operational shift arrived in the December 2025 release, build 3000.298. That release introduced Manual Update for On-premises Data Gateway in preview. Microsoft says admins can trigger updates from the gateway UI or programmatically through API or script, and the related documentation shows the PowerShell path with Update-DataGatewayClusterMember.

That may sound like a small administrative nicety. It is not. It is the difference between “we update the gateway when someone notices” and “we update the gateway during a planned window, on purpose, with a record of what happened.”

Microsoft’s update documentation is blunt about why this matters in clusters. When gateway members run different versions, you can get sporadic failures because one member can handle a query that another cannot. The guidance is to disable one member, let the work drain, update it, re-enable it, and repeat for the rest of the cluster. That is not fancy advice. It is good advice. Production systems usually break in ordinary, irritating ways.

Two details matter:

The November 2025 release is the baseline for the manual update feature.
Microsoft says the updater service activates only when an update is triggered from the UI or via PowerShell.

In other words, December did not add one more button. It added a more controlled update path for teams that have to care about maintenance windows, change management, and not getting yelled at on a Friday night.

January made the gateway more relevant to pipeline-heavy Spark teams

The January 2026 release, build 3000.302, was modest on paper but more interesting in practice. Microsoft called out two improvements:

Performance optimization for reading CSV format in Copy job and Pipeline activities
Performance optimization for read and write through adaptive performance tuning capability in Pipeline

That is not a fireworks show, but it is more concrete than the average release note. If your Fabric Spark workflow begins with Copy jobs or Pipeline activities that pull CSV-shaped data before Spark takes over, January was the sort of release you should benchmark instead of shrugging at.

Notice what Microsoft did not say: there is no grand promise that everything is suddenly twice as fast and angels now sing over your lakehouse. Fine. Release notes rarely sing. Still, when a gateway sits in front of repetitive ingestion work, even a dull-sounding optimization can shave time off every run. Boring improvements are often the ones that pay rent.

Spark teams now have a second route for on-premises access

The most interesting shift is not in the gateway release notes at all. It is in Fabric’s managed private endpoint work for Data Engineering workloads.

Microsoft’s October 2025 Fabric blog post says Managed Private Endpoints support for connecting to Private Link Services became available through the Fabric Public REST APIs, specifically to help Fabric Spark compute reach on-premises and network-isolated data sources. The newer Learn guidance goes further: Fabric workloads such as Spark or Data Pipelines can connect to on-premises or custom-hosted sources through an approved Private Link setup, with traffic flowing through the Microsoft backbone network rather than the public internet.

That is a real architectural fork in the road.

If your team has treated the on-premises data gateway as the default answer to any sentence containing the words “on-premises” and “Fabric,” that default deserves another look. The managed private endpoint docs say that, once approved, Fabric Data Engineering workloads such as notebooks, Spark job definitions, materialized lakeviews, and Livy endpoints can securely connect to the approved resource.

That does not kill the gateway. It does mean the gateway is no longer the only respectable adult in the room.

There is also one gotcha that will ambush people who like clicking around until things work. Microsoft says creating a managed private endpoint with a fully qualified domain name through Private Link Service is supported only through the REST API, not the UX. So if your plan is “we’ll set it up later in the portal,” later may arrive carrying disappointment.

What a Fabric Spark team should do next

If I were cleaning this up for a real production team, the to-do list would look like this:

Check the supported monthly updates page before touching anything. As of late March 2026, it already lists March 2026, build 3000.310, as the newest supported gateway release.
If you run a gateway cluster, stop tolerating version drift. Follow Microsoft’s member-by-member update guidance so one node does not become the office goblin that fails queries the others can run.
If you want controlled upgrades, confirm your gateways are on the November 2025 baseline or later, then script manual updates with Update-DataGatewayClusterMember.
Inventory which Spark-adjacent workloads really need the gateway and which ones are gateway-shaped only because nobody revisited the design.
For Spark or Data Pipeline scenarios that need private access to on-premises or custom-hosted sources, evaluate managed private endpoints and Private Link Service instead of assuming the gateway must stay in the middle.
If your ingestion path leans on CSV through Copy jobs or Pipeline activities, test the January build improvements against your actual workloads rather than trusting vague optimism.

One more limitation matters here. The managed private endpoint overview says the feature depends on Fabric Data Engineering workload support in both the tenant home region and the capacity region. So before anyone gives a triumphant architecture presentation, check whether your region setup actually supports what you plan to do.

The short version

The February 2026 gateway release is a small compatibility release. On its own, it would barely justify a coffee break. For Fabric Spark teams, though, it lands in the middle of a more meaningful change.

Gateway maintenance is becoming easier to control. Pipeline-oriented gateway work picked up performance tuning in January. And Spark workloads now have a documented private-connectivity path that can bypass the old habit of stuffing every on-premises access pattern through the gateway.

So no, February 2026 was not a blockbuster. It was a signpost. The smart move is to stop treating the gateway as an untouchable default, update it like you mean it, and decide workload by workload whether Spark still needs that middleman.

If you want the raw source material rather than anyone’s interpretation, start here:

This post was written with help from anthropic/claude-opus-4-6

Bulk Import and Export Item Definitions Are the Fabric APIs Ops Teams Needed

Most Fabric deployment pain is not dramatic. It is slow, dumb, and expensive in the worst way. Somebody asks you to move a workspace full of notebooks, pipelines, reports, and models. Then the afternoon disappears into portal clicking, second-guessing, and the private terror that you forgot one dependency that will blow up later.

That is why the new bulk item-definition APIs matter.

Not because they are flashy. They are not. Not because they are finished. The official docs call both APIs beta and say they are for evaluation and development purposes, not recommended for production use. Good. Honesty is refreshing.

They matter because Fabric finally has official APIs for moving multiple item definitions in and out of a workspace in one operation. And the broader item definition overview says the quiet part out loud: definition-based APIs matter for fully automated deployment and bulk migrations. That is the operational opening teams have been waiting for.

First, what an “item definition” actually is

Fabric’s docs define an item definition as the structured set of files and metadata that describe how a Fabric item is built. Different item types have different formats and required parts.

That sounds abstract until you look at the wire format. In the Get Item Definition docs, a definition comes back as parts. Each part has a path, a payload, and a payload type. The sample uses InlineBase64. The platform file lives in that world too.

So no, this is not one magic blob. It is closer to a folder tree poured into JSON. Files, paths, and encoded content. The kind of thing automation can actually move without a human babysitter.

The supported-definition list is not trivial either. The overview includes notebooks, lakehouses, reports, semantic models, data pipelines, KQL dashboards, eventstreams, environments, and Spark job definitions, among others. If you live in Spark-heavy workspaces, that last piece matters.

What the bulk APIs actually do

Bulk Export Item Definitions (beta) lets you export item definitions from a workspace in a single operation. You can export all supported items or pass a specific list.

Bulk Import Item Definitions (beta) does the inverse. It imports multiple item definitions into a workspace, and the docs say the system will create new items or update existing ones based on whether the item already exists.

That is the boring sentence with teeth.

The export shape is practical. The sample response includes an itemDefinitionsIndex with item IDs and root paths, plus a definitionParts collection with file paths and InlineBase64 payloads. In other words, this is not portal smoke. It is structured material you can inspect, store, and move.

Why this changes the day job

Microsoft’s own overview says definition-based APIs matter for automated deployment and bulk migrations. That is not fluff. That is the foundation.

A sane workflow now looks like this:

Export item definitions from a dev workspace.
Store those definitions somewhere you can inspect and review.
Validate what changed.
Import the definitions into the next workspace.

Notice what I did not say. I did not say these APIs solve release management for you. They do not. They give you raw material. You still need naming discipline, environment strategy, and release gates. The API gives you lumber. It does not build the house.

But before this, a lot of Fabric promotion work still felt like moving furniture through a keyhole. Now it looks a lot more like files and operations, which is exactly what mature platform teams want.

The caveats are not optional

This is where a lot of blog posts start lying. Let’s not do that.

It is beta

The docs are explicit. Both APIs are beta. Both require beta=true in the query string. Both are described as evaluation and development features and not recommended for production use.

So if your first move is wiring this straight into your most fragile production deployment, that is not bold. That is sloppy.

Pilot it first. Use a low-risk workspace. Learn the payloads. Prove your rollback story. Then decide how far you want to push it.

Permissions will make or break your export

For both APIs, the caller needs a Contributor or higher role on the workspace. For delegated auth, the required scope is Items.ReadWrite.All.

The subtle trap is export completeness. When you export all items in a workspace, the docs say only items the caller has both read and write permissions for are exported. If you export a hand-picked list, the caller needs read and write permissions for every item on that list.

That means you can get a successful response and still end up with an incomplete export.

That is the kind of bug that ruins an evening.

If your item count looks light, do not start with conspiracy theories. Start with permissions.

App-only automation has a catch

Yes, the bulk APIs support user identities. They also support service principals and managed identities, but only when all items involved support service principals.

That caveat matters. It means the dream of fully headless CI/CD is real, but it is not universal. One unsupported item type in the batch can turn a clean automation story into a mess.

Check item support early, not the night before a demo.

These are long-running operations

Both bulk APIs use Fabric’s long-running operation pattern. Sometimes you get 200 OK. Sometimes you get 202 Accepted, plus a Location header, an x-ms-operation-id, and a Retry-After header.

That tells you exactly how to build the client:

submit the request
poll the operation status using the provided location or operation ID
wait the number of seconds in Retry-After
fetch the result when the operation succeeds

This is not the place for impatient, hard-coded polling loops. The service already told you how to behave. Listen to it.

Imports can fail in very ordinary ways

The bulk import docs list a few common errors worth taping to the wall:

DuplicateDisplayNameAndType
DependenciesCouldNotBeResolved
InvalidFilesPath

The bulk export docs call out failures like ItemsHaveProtectedLabels too.

None of these are exotic. They are exactly the problems teams create when naming gets loose, paths drift, or governance details get ignored.

Why Spark teams should care

If your Fabric world revolves around Spark, this is the part worth circling in red.

The item definition overview includes notebooks and Spark job definitions in the supported definition-based universe. That means core Spark artifacts are moving closer to a model that automation can export, inspect, and promote in bulk.

That does not replace Git. It does not replace testing. It does not replace competent release discipline.

What it does replace is some of the dumb manual glue work. And frankly, that glue work has been stealing time from real engineering for too long.

When a platform cannot expose important artifacts as structured, movable definitions, every promotion feels a little haunted. You can do it. You just never fully trust it. Bulk import and export do not make that anxiety disappear, but they finally give Fabric teams firmer ground.

A better first move than “let’s automate everything”

If I were rolling this out today, I would keep it simple:

Export all supported item definitions from a test workspace.
Inspect the returned root paths and definition parts so you understand the structure.
Re-import them into another test workspace.
Validate what was created and what was updated.
Only then test service-principal execution and larger promotion flows.

Small, boring rehearsals beat heroic rollout plans every time.

The bottom line

These APIs are not glamorous. They are not finished. Microsoft is being quite clear about that.

But they are operationally important.

Fabric now has official bulk APIs for item definitions. The docs explicitly tie definition-based APIs to automated deployment and bulk migrations. For teams managing notebooks, pipelines, reports, semantic models, and Spark assets across workspaces, that is a real shift.

Not a promise. A shift.

It means Fabric is getting better at the thing serious teams need most: turning workspace assets into something you can move, review, and automate without a human performing portal surgery at midnight.

Official docs worth keeping open

This post was written with help from anthropic/claude-opus-4-6

The API layer that wasn’t supposed to matter

The API layer that wasn't supposed to matter

The strangest platform announcements are usually the boring ones.

Nobody throws a party for source control. Nobody leans back and says, “Hell yes, deployment pipelines,” with a straight face. The applause goes to the flashy stuff: faster engines, new runtimes, clever demos. Then a quiet release slips past and changes the quality of production systems more than all the fireworks did.

That is what just happened with the general availability of source control and CI/CD support for the API for GraphQL in Microsoft Fabric.

On the surface, this looks minor. GraphQL artifacts can now live in Git. Teams can review changes through pull requests. APIs can move through Fabric deployment pipelines. It reads like housekeeping.

It is also the line between an API you demo and an API you trust.

The boring part is the point

GraphQL is easy to misread here. The real story is not query syntax. It is operational discipline.

Before this release, you could build an API for GraphQL on top of Fabric data sources. What you could not do cleanly was treat that API like the rest of your engineering system. It lived in an awkward middle state: important enough to matter, but not governed with the same rigor as the notebooks, jobs, and other artifacts around it.

Now Fabric supports Git integration for GraphQL items and supports GraphQL items in deployment pipelines. That means teams can version API changes, review them, and promote them across environments using the same lifecycle machinery they already use elsewhere in Fabric.

If you have ever cleaned up a production issue, you know why this matters. Production problems do not always come from spectacular failures. Quite often they come from a configuration that drifted, a schema that changed without review, or an environment that no longer matches the one everybody tested. The system is not obviously broken. It is slightly different in exactly the wrong place.

This release goes straight at that kind of failure.

What became generally available

The official blog post is short, which is fine because the details are the useful part.

Fabric now supports three things for API for GraphQL that matter in real engineering work.

First, you can version GraphQL artifacts in Git. Microsoft says GraphQL items can be synchronized with a repository so teams can track changes, collaborate, and roll back when needed. The docs also describe these items as Infrastructure as Code stored in the connected repository.

Second, you can put those GraphQL items through deployment pipelines. Fabric stages such as Development, Test, and Production can now carry GraphQL APIs forward just like other supported items.

Third, the workflow is reviewable. Microsoft explicitly calls out pull requests, branching, and governance around API changes. That sounds procedural until you remember what an API actually is: a contract. If the contract changes, review is not bureaucracy. It is the work.

One line in the docs deserves more attention than it will probably get: during deployment, only metadata is copied. The API metadata moves. The actual data does not. That sentence tells you how to think about promotion. You are not moving datasets through environments. You are moving the API definition that points at them.

The choice that changes deployment behavior

Here is the part most teams will miss the first time through.

The deployment story changes sharply depending on which authentication method you chose when you created the API.

Fabric supports two connectivity options for API for GraphQL: Single sign-on (SSO) and Saved credentials. They are not interchangeable, and the difference is not cosmetic.

If you use SSO, the docs say API clients use their own credentials to access the data source. Microsoft positions this option for Fabric data sources such as lakehouses, warehouses, and SQL analytics endpoints. More important for CI/CD, the docs say that when you deploy an SSO-based API from one workspace to another, the API in the target workspace automatically binds to the local copy of the data source in that target workspace, assuming both the API and the data source were deployed from the same source workspace.

That is a big deal. Dev can point to dev. Test can point to test. Production can point to production. The platform handles the rebinding.

If you use Saved credentials, the story changes. Microsoft says this mode is for cases where a shared credential sits between the API and the data source, including Azure data sources such as Azure SQL Database. In deployment pipelines, the docs say autobinding does not happen. The deployed API in the target workspace stays connected to the data source in the source workspace. Microsoft is blunt about the consequence: you must manually reconfigure connections or create new saved credentials in each target environment.

Same deployment pipeline. Opposite behavior. That is not a side note. That is the fact that will decide whether your rollout feels clean or haunted.

The docs add one more constraint that is easy to miss: once you choose an authentication method for an API, that choice applies to all data sources in that API. You cannot mix SSO and Saved credentials inside the same API.

The trap is not GraphQL. It is drift.

This is why Spark teams should care, even if they do not think of themselves as GraphQL teams.

A Spark team can do everything right in the data layer and still ship a messy consumer experience if the API layer is managed by hand. The notebook change gets reviewed. The lakehouse change gets tested. Then the API definition sits off to the side, touched manually, promoted inconsistently, and remembered by one person who is suddenly unavailable when something breaks.

Git integration and deployment pipelines do not make that risk vanish, but they drag it into the light. The API becomes reviewable. The history becomes visible. Rollback becomes possible.

And Fabric’s docs are refreshingly plain about where the remaining traps still are.

If your source API connects to a data source in a different workspace, the deployed API stays connected to that external source regardless of authentication method. Autobinding only works when the API and the data source start in the same source workspace.

There is also a schema caveat with real operational bite: GraphQL APIs in Fabric do not automatically detect schema changes in their underlying data sources. If a table or view changes, the API keeps using the schema it captured earlier until you refresh the API metadata yourself. Microsoft says that may mean updating the schema inside the API item, removing and re-adding columns, or in some cases removing and reattaching the whole data source.

That is not pretty. It is, however, the kind of detail serious teams need before they learn it the hard way.

What smart teams will do next

The practical response to this release is not excitement. It is inventory.

Start with a simple question: which of our GraphQL APIs use SSO, and which use Saved credentials?

That question now tells you something important about deployment behavior. If the API uses SSO and the data source lives in the same source workspace, pipeline promotion can autobind to the local target copy. If the API uses Saved credentials, you need an explicit post-deployment step to reconfigure the connection in each environment. If the API points across workspaces, do not expect autobinding to rescue you.

Then do the obvious thing teams postpone: connect the workspace to Git, commit the GraphQL artifacts, and review the resulting definitions like they matter. They do matter. An API is not decoration around the data platform. It is the part other systems actually touch.

After that, run a deployment pipeline on purpose, not during an outage. Promote an API from Development to Test. Confirm what bound where. Check whether the target API is using the data source you think it is using. If you depend on Saved credentials, write the reconfiguration step into the runbook now, while everyone is still calm.

Finally, treat schema refresh as a real operational task. If upstream tables or views change, do not assume the GraphQL layer will quietly keep up. The docs say it will not.

Why this matters more than it looks

People love dramatic turning points. Most production reliability does not arrive that way.

It arrives through small controls that remove whole categories of avoidable mistakes. Source control does that. Pull request review does that. Deployment pipelines do that. Clear rules about autobinding do that too, especially when the rules are strict enough to kill wishful thinking.

That is why this release matters.

Not because GraphQL suddenly became more fashionable. Not because CI/CD sounds good in a slide deck. It matters because Fabric just closed one of the classic weak spots in a data platform: the gap between building an access layer and governing it like production software.

For Spark teams, that is the real headline. The data job is not finished when the table is correct. It is finished when the contract that exposes that table can move through environments without guesswork.

That is what generally became available here. Not a shiny new abstraction. Something rarer.

A way to be less surprised later.

Read the docs, not just the headline

If you want the primary sources, start here:

This post was written with help from anthropic/claude-opus-4-6

DeltaFlow just changed the CDC conversation for Fabric Spark teams

A row changes in your operational database. That should be useful in seconds. Too often it turns into a side quest.

Raw CDC feeds are ugly. Debezium envelopes. Nested payloads. Schema drift. Then Spark teams spend their time turning change events back into tables. It is expensive work, and most of it is drudgery.

DeltaFlow is Fabric’s shot at removing that drudgery.

Microsoft’s docs and March 2026 blog posts describe DeltaFlow as a preview capability in Fabric Eventstreams. It takes raw Debezium CDC events and reshapes them into analytics-ready streams that mirror the source table structure. The stream keeps the source columns and adds metadata like change type and timestamps. Eventstreams handles schema registration, destination table management, and schema evolution. You turn it on by choosing “Analytics-ready events & auto-updated schema” during connector setup.

That is the part Spark teams should care about. Less time parsing CDC envelopes. More time writing logic that matters.

What is actually supported

Do not assume every CDC connector gets this.

The Eventstreams overview and connector docs tie DeltaFlow preview to four sources:
– Azure SQL Database CDC
– Azure SQL Managed Instance CDC
– SQL Server on VM DB CDC
– PostgreSQL Database CDC

The same overview lists MySQL Database CDC, MongoDB CDC, and Azure Cosmos DB CDC as Eventstreams connectors too. They are connectors. They are not called out with DeltaFlow preview support. If your estate runs on those systems, the old cleanup work does not disappear.

Why this changes the Spark path

Eventstreams now also has a Spark Notebook destination in preview. The destination can route Eventstream data directly into a Spark notebook and start a Spark Structured Streaming job.

That shortens the path.

Instead of dragging raw CDC into Spark and cleaning it up there, you can test a pipeline where Eventstreams does the CDC shaping first and Spark starts with data that already looks like tables. The payback is simple. Spark can spend its budget on joins, enrichment, aggregations, and writes instead of JSON surgery.

There is a second benefit. Microsoft says the DeltaFlow output is meant for straightforward analytics queries, including KQL. That matters because the same stream can feed a Spark notebook and other real-time consumers without forcing every downstream system to learn Debezium semantics.

The catch

This is preview. Act like it.

Preview is where features meet unpleasant reality: weird schemas, bad timing, broken assumptions, and the database that nobody documented properly. DeltaFlow may still be the right direction. It is not a blind cutover candidate.

Run it beside your current CDC path. Compare outputs. Change a source table. Watch what happens. Kill and restart the notebook path. See where the edges are before you let production depend on it.

Also, source coverage is still narrow. Mixed database estates are going to run split architectures for a while. DeltaFlow on the supported sources. Existing CDC plumbing everywhere else.

PostgreSQL teams have homework

The PostgreSQL connector doc is specific.

To enable CDC for PostgreSQL in this flow, Microsoft says you need:
– wal_level set to logical
– max_worker_processes set to at least 16
– a server restart after those changes
– replication permissions for the connecting admin user or table owner user

There is also a networking constraint. The database must be publicly accessible unless you use Eventstream connector virtual network injection. Miss that detail and your migration plan turns into a late-night fight with networking.

What to do now

Keep the rollout small and brutal.

Start with one supported source. Enable DeltaFlow. Pick “Analytics-ready events & auto-updated schema.” Route it to a Spark notebook destination. Then measure three things:
– How much parsing code vanished
– How much schema handling vanished
– How stable the preview behavior is under source changes

One more signal is worth noticing. In the same March 2026 feature summary, Microsoft listed the Eventstream SQL Operator as generally available. DeltaFlow itself is still preview, but the Eventstreams surface around it is getting more serious.

That is the moment to test. Not later, when everyone suddenly wants it in production at once.

Bottom line

DeltaFlow matters because it attacks the worst part of CDC work. Not the business logic. The plumbing.

For supported sources, that is real leverage for Fabric Spark teams. For unsupported sources, nothing changes yet.

So do the sensible thing. Test it early. Keep your current pipeline until the preview earns trust. Then decide whether DeltaFlow gets promoted from experiment to foundation.

This post was written with help from anthropic/claude-opus-4-6

Operationalizing Fabric’s February 2026 feature drop: what actually matters for Spark teams

Operationalizing Fabric's February 2026 feature drop: what actually matters for Spark teams

Microsoft’s monthly feature summaries have a familiar problem. They flatten every change into the same cheerful pitch. A new cell editor mode gets about the same oxygen as a moving security boundary. If you run Spark seriously on Fabric, that is useless. You need to know which items change architecture, which clean up the daily notebook grind, and which quietly add a new failure mode.

February’s release has all three. The headline is not “more features.” The headline is that Fabric keeps removing excuses for portal-driven, manually operated Spark environments. More of the platform can now be secured, composed, and managed through code. That is good news. It also means the easier Microsoft makes this, the more discipline you need on your side.

The change that actually alters architecture

CMK support for notebook code

This is the big one.

Fabric notebooks can now run inside CMK-enabled workspaces, with notebook content and associated notebook metadata encrypted at rest using customer-owned keys in Azure Key Vault. Microsoft is not vague about the coverage. The post calls out cell source, cell output, and cell attachments.

If your team has been splitting its development pattern because notebooks were the odd object out in a tighter security model, that split is no longer structurally required. Plenty of enterprises ended up with an awkward arrangement: secure workspaces for governed assets, then a side channel for notebook authoring and iteration. February closes that gap.

The payoff is boring in the best way. Fewer workarounds. Fewer places where permissions drift. Fewer security reviews where someone has to explain why the code path lives outside the workspace standard applied to everything else.

It also changes the migration conversation. Teams that avoided notebooks in regulated environments can revisit that decision. Teams already on notebooks can ask whether a separate architecture still buys them anything except paperwork.

The catch is operational, not conceptual. Keys rotate. Policies get tightened. When notebook content and metadata sit under the same CMK envelope, key management stops being an abstract security exercise and starts touching the authoring surface your engineers use every day. If you do not test rotation and recovery in a non-production workspace first, you are volunteering to learn in public.

The workflow fix Spark teams needed months ago

Python notebooks finally get %run

This was overdue.

PySpark notebooks had a workable modularity story. Python notebooks did not. If you wanted shared setup logic, common helper functions, or a standardized preamble, you either copied code between notebooks or invented a packaging scheme to compensate for a missing primitive.

Now Python notebooks support %run. You can reference and execute other notebooks in the same execution context, then directly use the functions and variables defined there. That is the difference between notebook code as a pile of local accidents and notebook code as something you can organize on purpose.

There is one limitation, and it matters: today %run in Python notebooks supports notebook items only. It does not yet run .py modules from the notebook resources folder. Microsoft says that support is coming soon. Fine. “Coming soon” is not an architecture. Build around notebook references now, and treat resource-folder module execution as a future upgrade if it arrives on time.

The immediate move for most teams is simple. Pull duplicated utility code into shared notebooks. Keep them small. Keep ownership clear. Do not turn %run into a dependency swamp where every notebook imports half the workspace and nobody can explain execution order without drawing a crime-scene diagram.

Version history now tells you where a change came from

This sounds like a minor quality-of-life improvement until you have to debug a bad deployment before the second cup of coffee.

Fabric notebook version history now labels the source of each saved version. Direct edits in the notebook, Git synchronizations, deployment pipeline updates, and publishing via VS Code all show up as distinct origins. That one label removes a stupid amount of ambiguity.

Before this, the question “what changed?” was followed by the more annoying question “through which path?” In a serious CI/CD setup, that distinction is the whole investigation. A manual portal edit points you to one human. A Git sync points you to a repo change. A deployment pipeline update points you to release plumbing. VS Code publishing points you somewhere else again. Same broken notebook, different root cause.

If your team uses more than one of these paths, update the runbook. The first step in notebook incident triage should now be checking the version source before anyone starts diffing content like a raccoon digging through a dumpster.

Full-size mode is small, but not trivial

Full-size mode lets a single notebook cell fill the workspace for editing. That is not glamorous. It is just useful.

Large SQL blocks, ugly transformation cells, and screenshared code reviews all get easier when the interface stops fighting you. Features like this do not make press-release people happy, but they do shave friction off work that happens every day. I would not redesign an architecture around it. I would absolutely use it.

The broader pattern hiding inside the release

Fabric is making Spark more reachable from both directions

Two February items matter together.

The new Microsoft ODBC Driver for Fabric Data Engineering gives external applications and ODBC-compatible tools a supported path into Spark SQL on Fabric. Microsoft describes it as ODBC 3.x compliant, backed by Livy APIs, and built for OneLake and Lakehouse data with Entra ID authentication, proxy support, session reuse, and Spark SQL coverage that looks designed for real workloads instead of demos.

Then there is Semantic Link 0.13.0. That release expands management coverage across lakehouses, reports, semantic models, SQL endpoints, and Spark. Microsoft is explicit about the direction: creating and managing lakehouses and tables, cloning and rebinding reports, refreshing and monitoring semantic models, and administering SQL and Spark settings from code.

Put those together and the platform’s direction is obvious. Fabric wants Spark environments that can be queried from outside and administered from inside code, without the portal as the center of the universe. That is the right direction. The portal is useful. The portal is not a control plane.

This is also where teams get themselves into trouble. The moment workspace operations become scriptable, governance stops being a policy deck and becomes a permissions design problem. If every engineer can programmatically create lakehouses, modify Spark settings, and rebind reports, then congratulations: you have built an accidental infrastructure platform. Maybe that is fine. Maybe it is a terrible idea. Decide before the scripts proliferate.

My bias is blunt. Treat Semantic Link as production infrastructure tooling, not as a convenience library. Set conventions early. Define who can do what. Log changes. Review the scripts that touch shared assets. Otherwise you will end up with beautiful automation and feral workspaces.

The quiet footgun in the admin section

Fabric identity limits now scale higher, but Fabric will not save you from bad math

Fabric now raises the default tenant limit for Fabric identities from 1,000 to 10,000. That is a real scale change, and for some organizations it removes an artificial ceiling that was starting to pinch.

It also lets admins set custom limits and manage them through the Update Tenant Setting REST API. Good. That is how this should work.

The problem is the warning Microsoft slips into the text: Fabric does not validate whether your custom limit fits within your Entra ID resource quota.

That means the setting feels authoritative while depending on an external quota boundary it does not enforce. In other words, the UI and API will happily let you declare ambition. Entra ID is the system that decides whether ambition has a permit.

So before anyone bumps the limit because “10,000 sounds better,” check the Entra side first. If you automate the setting, add that quota check to the automation. This is not exotic engineering. It is basic adult supervision.

What I would do this week

If you own Spark on Fabric, February’s release suggests a short, unromantic punch list.

Review whether CMK support lets you collapse any split workspace pattern built around notebook restrictions.
Start using %run in Python notebooks for shared helpers, but keep the dependency graph understandable.
Update notebook incident runbooks so version-source labels are part of first response.
Decide whether the ODBC driver and Semantic Link belong in your standard platform toolkit, then put guardrails around both before usage spreads.
Check Entra ID quotas before changing Fabric identity limits, especially if a script is going to do it for you.

That is the real shape of the month. A nicer notebook editor is fine. A new driver is nice. The deeper story is that Fabric keeps shifting Spark toward a model where security, reuse, and administration happen in code instead of in tribal knowledge and portal muscle memory. That is progress. It also means the teams that win will be the ones that pair new capability with restraint, because the platform is getting powerful enough to automate your mistakes at scale.

This post was written with help from anthropic/claude-opus-4-6

Operationalizing the semantic model permissions update for Fabric data agents

Permissions in data platforms have a remarkable talent for turning a two-minute job into a small municipal drama. You want one ordinary thing. The system hands you a form, a role, a workspace, another role, and, sooner or later, a person named Steve who is out until Thursday.

Starting April 6, 2026, Microsoft Fabric removes one of those little absurdities. Creators and consumers of Fabric data agents need only Read access on the semantic model to use it through a data agent. Workspace access is no longer required.

Small sentence. Large relief.

Why this matters

Fabric data agents use Azure OpenAI to interpret a user’s question, choose the most relevant source, and generate, validate, and execute the query needed to answer it. That source might be a lakehouse, warehouse, Power BI semantic model, KQL database, or ontology.

So the agent is already doing the interesting work. It is translating a human question into something a data system can answer. Requiring extra workspace access just to reach a semantic model added bureaucracy to the wrong layer.

The change, plainly

The official change is simple: beginning April 6, creators and consumers only need Read access on the semantic model to interact with it through a Fabric data agent. The older workspace access and Build permission hurdle disappears for this path.

If you have ever untangled access requests, you can probably hear the sigh from here.

What to do with that information

The first operational question is not “What new permission do I need?” It is “Which workspace grants exist only because the old rule forced them?”

Start there.

List the semantic models your data agents use.
Identify users or groups with workspace access granted only for those agent scenarios.
Test the new flow with a read-only user as April 6 approaches.
After the change lands, remove workspace access that no longer serves a separate purpose.

This is not glamorous work. Neither is plumbing, and everyone suddenly develops strong feelings about plumbing when it breaks.

The part people will miss

One detail matters more than the permission change itself. When a Fabric data agent generates DAX for a semantic model, it relies only on the model’s metadata and Prep for AI configuration. It ignores instructions added at the data agent level for DAX query generation.

That puts responsibility where it belongs: on the model.

If a business user asks a sensible question and gets a crooked answer, the fix is usually not a cleverer agent prompt. The fix is to improve what the model gives the agent to work with: the metadata and the Prep for AI setup.

That is the real operational shift. Access gets easier. Model preparation matters more.

A sensible rollout

If you own Fabric governance, keep the rollout dull and methodical.

Review which data agents rely on semantic models.
Retest those scenarios with users who have Read access on the model and no workspace access.
Inspect the models that produce weak DAX and improve the metadata and Prep for AI configuration they expose.
Clean up workspace permissions that were granted only to satisfy the old requirement.

Nobody frames that checklist and hangs it in the lobby. It still gets the job done.

The useful conclusion

The best part of this update is that it removes a fake dependency. A data agent that can answer questions from a semantic model should not require a side trip through workspace permissions.

The catch is that the agent still cannot invent a well-prepared model out of thin air. Fabric has made access lighter. It has also made the remaining truth easier to see: if you want better answers, the semantic model has to be ready for the job.

Which is, frankly, how this should have worked all along.

This post was written with help from anthropic/claude-opus-4-6

The Workaround Tax

What Actually Changed

The Spark Engineer’s Migration Checklist

What This Means for Your Spark Processing Layer

The Risks You Should Actually Worry About

The Bigger Picture

Share this:

What Actually Shipped

Why Spark Teams Should Care More Than Most

The Migration Risk Nobody’s Talking About

Your Rollout Checklist

What This Doesn’t Do

The Bigger Picture

Share this:

This is triage before it is migration

The Spark-specific catch is the part people will miss

Connection mapping is where “migrated” stops meaning “ready”

“Triggers disabled by default” is the best sentence in the whole thing

What this preview changes

A rollout pattern worth trusting

The practical takeaway

Share this:

What changed

Why Spark teams should care

What gets better in practice

What to do with the feature this week

Be picky about what belongs in Resources

Review the file changes, not just the notebook

Standardize the layout before entropy arrives in a nice shirt

Test one real notebook, not a toy example

The traps

The real value

Official docs worth keeping open

Share this:

The February release itself is small

December changed the maintenance story

January made the gateway more relevant to pipeline-heavy Spark teams

Spark teams now have a second route for on-premises access

What a Fabric Spark team should do next

The short version

Share this:

First, what an “item definition” actually is

What the bulk APIs actually do

Why this changes the day job

The caveats are not optional

It is beta

Permissions will make or break your export

App-only automation has a catch

These are long-running operations

Imports can fail in very ordinary ways

Why Spark teams should care

A better first move than “let’s automate everything”

The bottom line

Official docs worth keeping open

Share this:

The boring part is the point

What became generally available

The choice that changes deployment behavior

The trap is not GraphQL. It is drift.

What smart teams will do next

Why this matters more than it looks

Read the docs, not just the headline

Share this:

What is actually supported

Why this changes the Spark path

The catch

PostgreSQL teams have homework

What to do now

Bottom line

Share this:

The change that actually alters architecture

CMK support for notebook code

The workflow fix Spark teams needed months ago

Python notebooks finally get %run

Version history now tells you where a change came from

Full-size mode is small, but not trivial

The broader pattern hiding inside the release

Fabric is making Spark more reachable from both directions

The quiet footgun in the admin section

Fabric identity limits now scale higher, but Fabric will not save you from bad math