
What the February 2026 Fabric Influencers Spotlight means for your Spark team
Microsoft published its February 2026 Fabric Influencers Spotlight last week. Twelve community posts. MVPs and Super Users. Most people skim the list. Maybe bookmark a link. Move on.
Don’t.
Three of those posts carry signals that should change how your Spark data-engineering team operates in production. Not next quarter. Now.
Signal 1: Get your production code out of notebooks
Matthias Falland’s Fabric Friday episode makes the case plainly: notebooks are great for development but risky in production. That framing resonates with a lot of production teams—and for good reason.
Here’s the nuance. Microsoft has said there’s no inherent difference in performance or monitoring capabilities between Spark Job Definitions and notebooks. Both produce Spark logs. Both run on the same compute. The gap isn’t in what the platform offers. It’s in what each artifact encourages.
Notebooks encourage improvisation. Someone edits a cell at 2 AM. Cell state carries between runs. An error gets swallowed inside an output cell and nobody notices until downstream tables go stale. That’s not a platform limitation. That’s a human-factors problem. And production environments are where human-factors problems become outages.
Spark Job Definitions push you toward cleaner habits. One file per job. No cell state. Explicit parameters. Better modularity. The execution boundary is sharper, and sharper boundaries make failures easier to diagnose.
If your team runs notebooks on a schedule through pipelines, here’s the migration:
- Audit every notebook that runs on a schedule or gets triggered by a pipeline. Count them. You’ll be surprised.
- Extract the transformation logic into standalone Python or Scala files. One file per job. No magic. No “run all cells.”
- Create Spark Job Definitions for each. Map your existing notebook parameters to SJD parameters. They work the same way—just without the cell baggage.
- Wire them into your pipeline activities. Replace the notebook activity with an SJD activity. The orchestration stays identical.
- Keep the notebooks for development and ad-hoc exploration. That’s where they shine.
A team of three can typically convert a dozen notebooks in a week. The hard part isn’t the migration. It’s the decision to start.
Signal 2: Direct Lake changes how you write to your lakehouse
Pallavi Routaray’s post on Direct Lake architecture is the most consequential piece in the whole spotlight. Easy to miss because the title sounds like a Power BI topic.
It’s not. It’s a Spark topic.
Direct Lake mode reads Parquet files directly from OneLake. No import copy. No DirectQuery overhead. But it only works well if your Spark jobs write data in a way that Direct Lake can consume efficiently. Get the file layout wrong and your semantic model falls back to DirectQuery silently. Performance craters. Your BI team blames you. Nobody knows why.
Here’s the production checklist:
- Enable V-Order optimization on your Delta tables. V-Order sorts and compresses Parquet files for Direct Lake’s columnar read path. Here’s the catch: V-Order is disabled by default in new Fabric workspaces, optimized for write-heavy data engineering workloads. If your workspace was created recently, you need to enable it explicitly. Check your workspace settings—or set it at the table property level. Don’t assume it’s on.
- Control your file sizes. Microsoft’s guidance is clear: keep the number of Parquet files small and use large row groups. If your Spark jobs produce thousands of tiny files, Direct Lake will hit its file-count limits and fall back. Run
OPTIMIZEon your Delta tables after write operations. Compact aggressively. - Partition deliberately. Over-partitioning creates too many small files. Under-partitioning creates files that are too large for efficient column pruning. Partition by the grain your BI team actually filters on. Ask them. Don’t guess.
- Watch for schema drift. Direct Lake models bind to specific columns at creation time. If your Spark job adds or renames a column, the semantic model breaks. Coordinate schema changes explicitly. No silent ALTER TABLE commands on Friday afternoons.
The big risk here: most Spark teams don’t know their output feeds a Direct Lake model. The BI team built it after the fact. Start by mapping which of your Delta tables have Direct Lake semantic models sitting on top. If you don’t know, find out today.
Signal 3: CI/CD for Fabric just got real
Kevin Chant’s post covers the fabric-cicd tool reaching general availability for configuration-based deployments with Azure DevOps. This is verified and it matters more than it sounds.
Until now, deploying Fabric artifacts across environments—dev, test, prod—was either manual or held together with custom scripts that broke every time the API changed. The fabric-cicd tool gives you a supported, versioned path.
For Spark teams:
- Your Spark Job Definitions, lakehouse configurations, and pipeline definitions can live in source control and deploy through a proper pipeline. No more “I’ll just update it in the portal.”
- Configuration differences between environments—connection strings, capacity settings, lakehouse names—get handled through configuration files. Not by editing items in the portal after deployment.
- You can roll back. You can diff. You can review before promoting to production. The basic hygiene that every other engineering discipline has had for decades.
Here’s the migration path:
- Install fabric-cicd from the latest release. Follow Chant’s posts for the Azure DevOps YAML pipeline specifics.
- Export your existing workspace items to a Git repository. Fabric’s Git integration handles this natively.
- Build your environment-specific configuration files. One per environment. Map the items that differ: capacity, lakehouse, connections.
- Set up your Azure DevOps pipeline to run fabric-cicd on merge to main. Start with dry-run mode until you trust it.
- Remove portal-level edit access for production workspaces. This is the hard step. It’s also the one that prevents the next outage.
The deeper pattern
These three signals connect. Falland tells you to move your Spark code into artifacts built for production discipline. Routaray tells you how to write your output so downstream models don’t silently degrade. Chant tells you how to deploy the whole thing reliably across environments.
That’s a production pipeline. End to end. Code that runs cleanly, writes data correctly, and deploys safely.
The February spotlight also includes Open Mirroring hands-on guidance from Inturi Suparna Babu and a Fabric Data Agent walkthrough from Shubham Rai. Both are worth a read if you’re evaluating data replication strategies or AI-assisted query patterns over your lakehouse. But for Spark teams running production workloads, the three signals above are where the action is.
Your rollout checklist for March
- Inventory all scheduled notebooks. Tag them by risk: frequency, data volume, downstream dependencies.
- Convert the highest-risk notebook to a Spark Job Definition this week. Validate it runs identically.
- Audit Delta table write patterns for any table that feeds a Direct Lake model. Check that V-Order is enabled. Run OPTIMIZE to compact files.
- Install fabric-cicd. Connect your workspace to Git. Build your first environment config.
- Pick one pipeline to deploy through CI/CD end-to-end. Prove it works before scaling.
Five items. All concrete. All doable in March.
The community did the research. Your job is to act on it.
This post was written with help from anthropic/claude-opus-4-6







