Optimizing Spark Performance with the Native Execution Engine (NEE) in Microsoft Fabric

Spark tuning often starts with the usual suspects (shuffle volume, skew, join strategy, caching)… but sometimes the biggest win is simply executing the same logical plan on a faster engine.

Microsoft Fabric’s Native Execution Engine (NEE) does exactly that: it keeps Spark’s APIs and control plane, but runs a large portion of Spark SQL / DataFrame execution on a vectorized C++ engine.

What NEE is (and why it’s fast)

NEE is a vectorized native engine that integrates into Fabric Spark and can accelerate many SQL/DataFrame operators without you rewriting your code.

  • You still write Spark SQL / DataFrames.
  • Spark still handles distributed execution and scheduling.
  • For supported operators, compute is offloaded to a native engine (reducing JVM overhead and using columnar/vectorized execution).

Fabric documentation calls out NEE as being based on Apache Gluten (the Spark-to-native glue layer) and Velox (the native execution library).

When NEE tends to help the most

NEE shines when your workload is:

  • SQL-heavy (joins, aggregates, projections, filters)
  • CPU-bound (compute dominates I/O)
  • Primarily on Parquet / Delta

You’ll see less benefit (or fallback) when you rely on features NEE doesn’t support yet.

How to enable NEE (3 practical options)

1) Environment-level toggle (recommended for teams)

In your Fabric Environment settings, go to Acceleration and enable the native execution engine, then Save + Publish.

Benefit: notebooks and Spark Job Definitions that use that environment inherit the setting automatically.

2) Enable for a single notebook / job via Spark config

In a notebook cell:

%%configure
{
  "conf": {
    "spark.native.enabled": "true"
  }
}

For Spark Job Definitions, add the same Spark property.

3) Disable/enable per-query when you hit unsupported features

If a specific query uses an unsupported operator/expression and you want to force JVM Spark for that query:

SET spark.native.enabled=FALSE;
-- run the query
SET spark.native.enabled=TRUE;

How to confirm NEE is actually being used

Two low-friction checks:

  1. Spark UI / History Server: look for plan nodes ending with Transformer or nodes like *NativeFileScan / VeloxColumnarToRowExec.
  2. df.explain(): the same Transformer / NativeFileScan / Velox… hints should appear in the plan.

Fabric also exposes a dedicated view (“Gluten SQL / DataFrame”) to help spot which queries ran on the native engine vs. fell back.

Fallback is a feature (but you should know the common triggers)

NEE includes an automatic fallback mechanism: if the plan contains unsupported features, Spark will run that portion on the JVM engine.

A few notable limitations called out in Fabric documentation:

  • UDFs aren’t supported (fallback)
  • Structured streaming isn’t supported (fallback)
  • File formats like CSV/JSON/XML aren’t accelerated
  • ANSI mode isn’t supported

There are also some behavioral differences worth remembering (rounding/casting edge cases) if you have strict numeric expectations.

A pragmatic “NEE-first” optimization workflow

  1. Turn NEE on for the environment (or your job) and rerun the workload.
  2. If it’s still slow, open the plan and answer: is the slow part running on the native engine, or did it fall back?
  3. If it fell back, make the smallest possible change to keep the query on the native path (e.g., avoid UDFs; prefer built-in expressions; standardize on Parquet/Delta).
  4. Once the plan stays mostly native, go back to classic Spark tuning: reduce shuffle volume, fix skew, sane partitioning, and confirm broadcast joins.

References

This post was written with help from ChatGPT 5.2