
You have been running Spark on the JVM for years. It works. Your pipelines finish before the SLA alarm fires, your data scientists get their DataFrames, and you have learned to live with the garbage collector the way one learns to coexist with a roommate who occasionally rearranges all the furniture at 3 AM.
Then Microsoft shipped the Native Execution Engine for Fabric Spark, and the pitch is seductive: swap the JVM’s row-at-a-time processing for a vectorized C++ execution layer built on Meta’s Velox and Apache Gluten, get up to 6x faster query performance on compute-heavy workloads, change zero lines of code, pay nothing extra. Microsoft’s TPC-DS benchmarks at 1 TB scale show roughly 4x improvement over vanilla open-source Spark. Internal Fabric workloads have hit 6x.
Those are real numbers. But “flip the switch and go faster” is a marketing sentence, not an engineering plan. What follows is the checklist your team needs to move production Spark workloads onto the Native Execution Engine without discovering exciting new failure modes at 2 AM on a Tuesday.
Prerequisite Zero: Understand What You Are Opting Into
The Native Execution Engine does not replace Spark. It replaces Spark’s JVM-based physical execution operators — the actual computation — with native C++ equivalents for supported operations. Everything above the physical plan remains untouched: SQL parsing, logical optimization, cost-based rewrites, adaptive query execution, predicate pushdown, column pruning. None of that moves.
Here is the handoff in concrete terms. Spark produces its optimized physical plan as it always has. Apache Gluten intercepts that plan, identifies which operators have native C++ implementations in Velox, and swaps those nodes out. Velox executes them using columnar batches and SIMD instructions, processing 8, 16, or 32 values per CPU instruction instead of iterating row by row through JVM objects.
For operators Velox does not yet support, the engine falls back to standard Spark execution. The transition at the native/JVM boundary requires columnar-to-row and row-to-columnar conversions. These conversions cost real time. A workload that triggers frequent fallbacks can run slower with the engine enabled than without it.
That last sentence matters more than the benchmark numbers. The Native Execution Engine is a selective replacement of physical operators, not a uniform accelerator. Your performance outcome depends on how much of your workload stays in native territory.
Step 1: Confirm You Are on Runtime 1.3
The engine requires Fabric Runtime 1.3 (Apache Spark 3.5, Delta Lake 3.2). Runtime 1.2 support has been discontinued — and here is the dangerous part — silently. If you are still on 1.2, native acceleration is disabled without warning. You will not get an error. You will get no speedup. You will blame the engine rather than your runtime version. Check this first.
Action items:
– Open each Fabric workspace running production Spark workloads
– Navigate to Workspace Settings → Data Engineering/Science → Spark Settings
– Confirm Runtime 1.3 is selected
– If you are on Runtime 1.2, plan the runtime upgrade as a separate migration with its own validation cycle. Spark 3.4 to 3.5 brings behavioral changes unrelated to the native engine, and you do not want to debug two migrations at once
Step 2: Audit Your Workloads
Not every job benefits equally. The engine does its best work on compute-intensive analytical queries — aggregations, joins, filters, projections, complex expressions — over Parquet and Delta data. It adds less to I/O-bound workloads or jobs dominated by Python UDFs that run outside the Spark execution engine entirely.
Build a four-tier inventory:
- Tier 1 — High-value candidates: Long-running batch ETL with heavy aggregations and joins over Delta tables. These are your biggest CU consumers and your biggest potential beneficiaries. Think: the nightly pipeline that computes vendor aggregates across three years of transaction data, currently consuming 45 minutes of a large cluster.
- Tier 2 — Likely beneficiaries: Interactive notebooks running analytical queries. Data science feature engineering pipelines that stack transformations before model training.
- Tier 3 — Uncertain: Workloads using exotic operators, deeply nested struct types, or heavy UDF logic. These need individual testing because you cannot predict fallback behavior from the code alone.
- Tier 4 — Skip for now: Streaming workloads, jobs dominated by external API calls, or workloads where Python UDF processing accounts for most of the wall-clock time.
Migrate Tier 1 first. You need evidence that the engine delivers measurable wins on your actual workloads before you spend political capital rolling it out everywhere.
Step 3: Create a Non-Production Test Environment
Do not enable the engine on production and hope. Create a dedicated Fabric environment:
- In the Fabric portal, create a new Environment item
- Navigate to the Acceleration tab
- Check Enable native execution engine
- Save and Publish
Attach this environment to a non-production workspace. Run your Tier 1 workloads against it using production-scale data. This matters: performance characteristics at 10 GB do not predict behavior at 10 TB, because operator fallback patterns depend on data distributions, not just query structure.
For quick per-notebook testing without a full environment, drop this in your first cell:
%%configure
{
"conf": {
"spark.native.enabled": "true"
}
}
This takes effect immediately — no session restart required — which makes A/B comparisons trivial.
Step 4: Measure Baselines
You cannot prove improvement without a baseline. For each Tier 1 workload, capture:
- Wall-clock duration from the Spark UI (total job time, not stage time — stage time ignores scheduling and shuffle overhead)
- CU consumption from Fabric monitoring (this is what your budget cares about)
- Spark Advisor warnings in the current state, so you can distinguish new warnings from pre-existing noise after enabling native execution
- Row counts and checksums on output tables — correctness verification requires a pre-migration snapshot
Store these in a Delta table. You will reference them for weeks.
Step 5: Run Native and Watch for Fallbacks
Enable the engine on your test environment and run each Tier 1 workload. Then check two things.
Performance delta: Compare wall-clock time and CU consumption against your baselines. On a genuinely compute-heavy workload, you should see at least 1.5x improvement. If you do not, something is triggering fallbacks and you are paying the columnar-to-row conversion tax without getting the native execution benefit.
Fallback alerts: The Spark Advisor now reports real-time warnings during notebook execution when operators fall back from native to JVM execution. Each alert names the specific operator that could not run natively.
The most common fallback trigger, and the most easily fixed: .show(). This call invokes collectLimit and toprettystring, neither of which has a native implementation. Replace .show() with .collect() or .toPandas() in production code. In a notebook cell you run manually for debugging, it does not matter — but inside a scheduled pipeline, every fallback is a boundary crossing.
Other triggers to watch: unsupported expression types, complex nested struct operations, and certain window function variants. For each one, ask three questions:
- Can I rewrite the query to avoid it? Sometimes this is a one-line change. Sometimes it means restructuring a transformation.
- Is the fallback on a critical path? A fallback in a logging cell is noise. A fallback inside your core join-and-aggregate chain is a problem.
- Is the net performance still positive? If the workload runs 3x faster overall despite two fallback warnings on minor operations, accept the win and move on.
Step 6: Validate Data Correctness
Faster means nothing if the answers change. For each migrated workload:
- Compare output row counts between native and non-native runs on identical input data
- Run hash comparisons on key output columns
- For financial or compliance-sensitive pipelines, do a full row-level diff on a representative partition
The Native Execution Engine preserves Spark semantics, but floating-point arithmetic at boundary conditions, null handling in edge cases, and row ordering in non-deterministic operations all deserve explicit verification on your actual data. Do not skip this step because the TPC-DS numbers looked good. TPC-DS does not have your data shapes.
Step 7: Plan Your Rollback
The best operational property of the Native Execution Engine: it can be disabled per cell, per notebook, per environment, instantly. No restarts. No redeployments.
In PySpark:
spark.conf.set('spark.native.enabled', 'false')
In Spark SQL:
SET spark.native.enabled=FALSE;
Your rollback plan is one line of configuration. But that line only helps if your on-call engineers know it exists. Document it. Add it to your runbook. Add it to the incident response template. The worst production regression is one where the fix takes ten seconds but nobody knows about it for two hours.
Step 8: Roll Out Incrementally
With validation complete, enable the engine in production using one of three strategies, ordered from most cautious to broadest:
Option C — Per-job enablement: Add spark.native.enabled=true to individual Spark Job Definitions or notebook configure blocks. You control exactly which workloads get native execution.
Option A — Environment-level: Navigate to your production Environment → Acceleration tab → enable. All notebooks and Spark Job Definitions using this environment inherit the setting.
Option B — Workspace default: Set your native-enabled environment as the workspace default via Workspace Settings → Data Engineering/Science → Environment. Everything in the workspace picks it up.
Start with Option C on your validated Tier 1 workloads. After a week of stable production runs, graduate to Option A. Option B is for teams that have fully validated their workspace and want blanket coverage.
Step 9: Monitor the First Week
Post-migration monitoring matters because production data is not test data. In the first week:
- Watch CU consumption trends in Fabric monitoring. Compute-heavy workloads should show measurable drops.
- Check the Spark Advisor for fallback warnings that did not appear during testing. Different data distributions or code paths in production can trigger different operators.
- Set alerts on job duration. A sudden increase means a new fallback or regression appeared.
- Pay attention to any jobs that were borderline in testing. Production-scale data volume can push a workload from “mostly native” to “mostly fallback” if it exercises operators that were uncommon in test data.
Step 10: Optimize for Maximum Native Coverage
Once stable, push further:
- Replace all
.show()calls with.collect()or.display()in scheduled notebook workflows - Refactor deeply nested struct operations into flat columnar operations where the query logic allows it
- Consult the Apache Gluten documentation for the current supported operator list and avoid unsupported expressions in hot paths
- Keep data in Parquet or Delta format — the engine processes these natively, and other formats require conversion that erases the acceleration
- For write-heavy workloads, leverage the GA-release native Delta write acceleration, which extends native execution into the output path rather than just the read and transform stages
What Does Not Change
Several things remain identical and need no migration planning:
- Spark APIs: Your PySpark, Scala, and SQL code is unchanged. The engine operates below the API surface.
- Delta Lake semantics: ACID transactions, time travel, schema enforcement — all handled by the same Delta Lake 3.2 layer on Runtime 1.3.
- Cost model: No additional CU charges. Your jobs finish faster, so you consume fewer CUs for the same work. The pricing advantage is indirect but real.
- Fault tolerance: Spark still manages task retries, stage recovery, and speculative execution. The native engine handles computation; Spark handles resilience.
The Bottom Line
The Native Execution Engine is GA. It runs on the standard Fabric runtime. The performance gains are backed by reproducible benchmarks — up to 4x on TPC-DS at 1 TB, with real-world analytical workloads frequently reaching 6x. It costs nothing to enable and one line of configuration to revert.
But there is a gap between “we turned it on and things got faster” and “we know exactly which workloads got faster, by how much, what fell back, and what to do when something breaks.” The checklist above bridges that gap.
Runtime 1.3. Audit. Baselines. Test. Fallbacks. Correctness. Rollback. Incremental rollout. Monitor. Optimize.
Ten steps. Zero heroics. Measurably faster Spark.
This post was written with help from anthropic/claude-opus-4-6
