
Spark tuning often starts with the usual suspects (shuffle volume, skew, join strategy, caching)… but sometimes the biggest win is simply executing the same logical plan on a faster engine.
Microsoft Fabric’s Native Execution Engine (NEE) does exactly that: it keeps Spark’s APIs and control plane, but runs a large portion of Spark SQL / DataFrame execution on a vectorized C++ engine.
What NEE is (and why it’s fast)
NEE is a vectorized native engine that integrates into Fabric Spark and can accelerate many SQL/DataFrame operators without you rewriting your code.
- You still write Spark SQL / DataFrames.
- Spark still handles distributed execution and scheduling.
- For supported operators, compute is offloaded to a native engine (reducing JVM overhead and using columnar/vectorized execution).
Fabric documentation calls out NEE as being based on Apache Gluten (the Spark-to-native glue layer) and Velox (the native execution library).
When NEE tends to help the most
NEE shines when your workload is:
- SQL-heavy (joins, aggregates, projections, filters)
- CPU-bound (compute dominates I/O)
- Primarily on Parquet / Delta
You’ll see less benefit (or fallback) when you rely on features NEE doesn’t support yet.
How to enable NEE (3 practical options)
1) Environment-level toggle (recommended for teams)
In your Fabric Environment settings, go to Acceleration and enable the native execution engine, then Save + Publish.
Benefit: notebooks and Spark Job Definitions that use that environment inherit the setting automatically.
2) Enable for a single notebook / job via Spark config
In a notebook cell:
%%configure
{
"conf": {
"spark.native.enabled": "true"
}
}
For Spark Job Definitions, add the same Spark property.
3) Disable/enable per-query when you hit unsupported features
If a specific query uses an unsupported operator/expression and you want to force JVM Spark for that query:
SET spark.native.enabled=FALSE;
-- run the query
SET spark.native.enabled=TRUE;
How to confirm NEE is actually being used
Two low-friction checks:
- Spark UI / History Server: look for plan nodes ending with
Transformeror nodes like*NativeFileScan/VeloxColumnarToRowExec. - df.explain(): the same
Transformer/NativeFileScan/Velox…hints should appear in the plan.
Fabric also exposes a dedicated view (“Gluten SQL / DataFrame”) to help spot which queries ran on the native engine vs. fell back.
Fallback is a feature (but you should know the common triggers)
NEE includes an automatic fallback mechanism: if the plan contains unsupported features, Spark will run that portion on the JVM engine.
A few notable limitations called out in Fabric documentation:
- UDFs aren’t supported (fallback)
- Structured streaming isn’t supported (fallback)
- File formats like CSV/JSON/XML aren’t accelerated
- ANSI mode isn’t supported
There are also some behavioral differences worth remembering (rounding/casting edge cases) if you have strict numeric expectations.
A pragmatic “NEE-first” optimization workflow
- Turn NEE on for the environment (or your job) and rerun the workload.
- If it’s still slow, open the plan and answer: is the slow part running on the native engine, or did it fall back?
- If it fell back, make the smallest possible change to keep the query on the native path (e.g., avoid UDFs; prefer built-in expressions; standardize on Parquet/Delta).
- Once the plan stays mostly native, go back to classic Spark tuning: reduce shuffle volume, fix skew, sane partitioning, and confirm broadcast joins.
References
- Fabric blog — Native Execution Engine GA
- Microsoft Learn — Native execution engine overview
- Apache Gluten (Incubating)
- Velox
This post was written with help from ChatGPT 5.2








