Fabric Spark’s Native Execution Engine: What Speeds Up, What Falls Back, and What to Watch

The Production Migration Checklist for Fabric's Native Execution Engine

You have been running Spark on the JVM for years. It works. Your pipelines finish before the SLA alarm fires, your data scientists get their DataFrames, and you have learned to live with the garbage collector the way one learns to coexist with a roommate who occasionally rearranges all the furniture at 3 AM.

Then Microsoft shipped the Native Execution Engine for Fabric Spark, and the pitch is seductive: swap the JVM’s row-at-a-time processing for a vectorized C++ execution layer built on Meta’s Velox and Apache Gluten, get up to 6x faster query performance on compute-heavy workloads, change zero lines of code, pay nothing extra. Microsoft’s TPC-DS benchmarks at 1 TB scale show roughly 4x improvement over vanilla open-source Spark. Internal Fabric workloads have hit 6x.

Those are real numbers. But “flip the switch and go faster” is a marketing sentence, not an engineering plan. What follows is the checklist your team needs to move production Spark workloads onto the Native Execution Engine without discovering exciting new failure modes at 2 AM on a Tuesday.

Prerequisite Zero: Understand What You Are Opting Into

The Native Execution Engine does not replace Spark. It replaces Spark’s JVM-based physical execution operators — the actual computation — with native C++ equivalents for supported operations. Everything above the physical plan remains untouched: SQL parsing, logical optimization, cost-based rewrites, adaptive query execution, predicate pushdown, column pruning. None of that moves.

Here is the handoff in concrete terms. Spark produces its optimized physical plan as it always has. Apache Gluten intercepts that plan, identifies which operators have native C++ implementations in Velox, and swaps those nodes out. Velox executes them using columnar batches and SIMD instructions, processing 8, 16, or 32 values per CPU instruction instead of iterating row by row through JVM objects.

For operators Velox does not yet support, the engine falls back to standard Spark execution. The transition at the native/JVM boundary requires columnar-to-row and row-to-columnar conversions. These conversions cost real time. A workload that triggers frequent fallbacks can run slower with the engine enabled than without it.

That last sentence matters more than the benchmark numbers. The Native Execution Engine is a selective replacement of physical operators, not a uniform accelerator. Your performance outcome depends on how much of your workload stays in native territory.

Step 1: Confirm You Are on Runtime 1.3

The engine requires Fabric Runtime 1.3 (Apache Spark 3.5, Delta Lake 3.2). Runtime 1.2 support has been discontinued — and here is the dangerous part — silently. If you are still on 1.2, native acceleration is disabled without warning. You will not get an error. You will get no speedup. You will blame the engine rather than your runtime version. Check this first.

Action items:
– Open each Fabric workspace running production Spark workloads
– Navigate to Workspace Settings → Data Engineering/Science → Spark Settings
– Confirm Runtime 1.3 is selected
– If you are on Runtime 1.2, plan the runtime upgrade as a separate migration with its own validation cycle. Spark 3.4 to 3.5 brings behavioral changes unrelated to the native engine, and you do not want to debug two migrations at once

Step 2: Audit Your Workloads

Not every job benefits equally. The engine does its best work on compute-intensive analytical queries — aggregations, joins, filters, projections, complex expressions — over Parquet and Delta data. It adds less to I/O-bound workloads or jobs dominated by Python UDFs that run outside the Spark execution engine entirely.

Build a four-tier inventory:

  • Tier 1 — High-value candidates: Long-running batch ETL with heavy aggregations and joins over Delta tables. These are your biggest CU consumers and your biggest potential beneficiaries. Think: the nightly pipeline that computes vendor aggregates across three years of transaction data, currently consuming 45 minutes of a large cluster.
  • Tier 2 — Likely beneficiaries: Interactive notebooks running analytical queries. Data science feature engineering pipelines that stack transformations before model training.
  • Tier 3 — Uncertain: Workloads using exotic operators, deeply nested struct types, or heavy UDF logic. These need individual testing because you cannot predict fallback behavior from the code alone.
  • Tier 4 — Skip for now: Streaming workloads, jobs dominated by external API calls, or workloads where Python UDF processing accounts for most of the wall-clock time.

Migrate Tier 1 first. You need evidence that the engine delivers measurable wins on your actual workloads before you spend political capital rolling it out everywhere.

Step 3: Create a Non-Production Test Environment

Do not enable the engine on production and hope. Create a dedicated Fabric environment:

  1. In the Fabric portal, create a new Environment item
  2. Navigate to the Acceleration tab
  3. Check Enable native execution engine
  4. Save and Publish

Attach this environment to a non-production workspace. Run your Tier 1 workloads against it using production-scale data. This matters: performance characteristics at 10 GB do not predict behavior at 10 TB, because operator fallback patterns depend on data distributions, not just query structure.

For quick per-notebook testing without a full environment, drop this in your first cell:

%%configure
{
  "conf": {
    "spark.native.enabled": "true"
  }
}


This takes effect immediately — no session restart required — which makes A/B comparisons trivial.

Step 4: Measure Baselines

You cannot prove improvement without a baseline. For each Tier 1 workload, capture:

  • Wall-clock duration from the Spark UI (total job time, not stage time — stage time ignores scheduling and shuffle overhead)
  • CU consumption from Fabric monitoring (this is what your budget cares about)
  • Spark Advisor warnings in the current state, so you can distinguish new warnings from pre-existing noise after enabling native execution
  • Row counts and checksums on output tables — correctness verification requires a pre-migration snapshot

Store these in a Delta table. You will reference them for weeks.

Step 5: Run Native and Watch for Fallbacks

Enable the engine on your test environment and run each Tier 1 workload. Then check two things.

Performance delta: Compare wall-clock time and CU consumption against your baselines. On a genuinely compute-heavy workload, you should see at least 1.5x improvement. If you do not, something is triggering fallbacks and you are paying the columnar-to-row conversion tax without getting the native execution benefit.

Fallback alerts: The Spark Advisor now reports real-time warnings during notebook execution when operators fall back from native to JVM execution. Each alert names the specific operator that could not run natively.

The most common fallback trigger, and the most easily fixed: .show(). This call invokes collectLimit and toprettystring, neither of which has a native implementation. Replace .show() with .collect() or .toPandas() in production code. In a notebook cell you run manually for debugging, it does not matter — but inside a scheduled pipeline, every fallback is a boundary crossing.

Other triggers to watch: unsupported expression types, complex nested struct operations, and certain window function variants. For each one, ask three questions:

  1. Can I rewrite the query to avoid it? Sometimes this is a one-line change. Sometimes it means restructuring a transformation.
  2. Is the fallback on a critical path? A fallback in a logging cell is noise. A fallback inside your core join-and-aggregate chain is a problem.
  3. Is the net performance still positive? If the workload runs 3x faster overall despite two fallback warnings on minor operations, accept the win and move on.

Step 6: Validate Data Correctness

Faster means nothing if the answers change. For each migrated workload:

  • Compare output row counts between native and non-native runs on identical input data
  • Run hash comparisons on key output columns
  • For financial or compliance-sensitive pipelines, do a full row-level diff on a representative partition

The Native Execution Engine preserves Spark semantics, but floating-point arithmetic at boundary conditions, null handling in edge cases, and row ordering in non-deterministic operations all deserve explicit verification on your actual data. Do not skip this step because the TPC-DS numbers looked good. TPC-DS does not have your data shapes.

Step 7: Plan Your Rollback

The best operational property of the Native Execution Engine: it can be disabled per cell, per notebook, per environment, instantly. No restarts. No redeployments.

In PySpark:

spark.conf.set('spark.native.enabled', 'false')


In Spark SQL:

SET spark.native.enabled=FALSE;


Your rollback plan is one line of configuration. But that line only helps if your on-call engineers know it exists. Document it. Add it to your runbook. Add it to the incident response template. The worst production regression is one where the fix takes ten seconds but nobody knows about it for two hours.

Step 8: Roll Out Incrementally

With validation complete, enable the engine in production using one of three strategies, ordered from most cautious to broadest:

Option C — Per-job enablement: Add spark.native.enabled=true to individual Spark Job Definitions or notebook configure blocks. You control exactly which workloads get native execution.

Option A — Environment-level: Navigate to your production Environment → Acceleration tab → enable. All notebooks and Spark Job Definitions using this environment inherit the setting.

Option B — Workspace default: Set your native-enabled environment as the workspace default via Workspace Settings → Data Engineering/Science → Environment. Everything in the workspace picks it up.

Start with Option C on your validated Tier 1 workloads. After a week of stable production runs, graduate to Option A. Option B is for teams that have fully validated their workspace and want blanket coverage.

Step 9: Monitor the First Week

Post-migration monitoring matters because production data is not test data. In the first week:

  • Watch CU consumption trends in Fabric monitoring. Compute-heavy workloads should show measurable drops.
  • Check the Spark Advisor for fallback warnings that did not appear during testing. Different data distributions or code paths in production can trigger different operators.
  • Set alerts on job duration. A sudden increase means a new fallback or regression appeared.
  • Pay attention to any jobs that were borderline in testing. Production-scale data volume can push a workload from “mostly native” to “mostly fallback” if it exercises operators that were uncommon in test data.

Step 10: Optimize for Maximum Native Coverage

Once stable, push further:

  • Replace all .show() calls with .collect() or .display() in scheduled notebook workflows
  • Refactor deeply nested struct operations into flat columnar operations where the query logic allows it
  • Consult the Apache Gluten documentation for the current supported operator list and avoid unsupported expressions in hot paths
  • Keep data in Parquet or Delta format — the engine processes these natively, and other formats require conversion that erases the acceleration
  • For write-heavy workloads, leverage the GA-release native Delta write acceleration, which extends native execution into the output path rather than just the read and transform stages

What Does Not Change

Several things remain identical and need no migration planning:

  • Spark APIs: Your PySpark, Scala, and SQL code is unchanged. The engine operates below the API surface.
  • Delta Lake semantics: ACID transactions, time travel, schema enforcement — all handled by the same Delta Lake 3.2 layer on Runtime 1.3.
  • Cost model: No additional CU charges. Your jobs finish faster, so you consume fewer CUs for the same work. The pricing advantage is indirect but real.
  • Fault tolerance: Spark still manages task retries, stage recovery, and speculative execution. The native engine handles computation; Spark handles resilience.

The Bottom Line

The Native Execution Engine is GA. It runs on the standard Fabric runtime. The performance gains are backed by reproducible benchmarks — up to 4x on TPC-DS at 1 TB, with real-world analytical workloads frequently reaching 6x. It costs nothing to enable and one line of configuration to revert.

But there is a gap between “we turned it on and things got faster” and “we know exactly which workloads got faster, by how much, what fell back, and what to do when something breaks.” The checklist above bridges that gap.

Runtime 1.3. Audit. Baselines. Test. Fallbacks. Correctness. Rollback. Incremental rollout. Monitor. Optimize.

Ten steps. Zero heroics. Measurably faster Spark.

This post was written with help from anthropic/claude-opus-4-6

Open Mirroring + OneLake: Spark patterns that keep latency from eating your weekends

Open Mirroring + OneLake: Spark patterns that keep latency from eating your weekends

Dev is clean. Prod is chaos. In dev, your mirrored table has a cute little dataset and Spark tears through it. In prod, that same notebook starts wheezing like it ran a marathon in wet jeans.

If that sounds familiar, good. You’re not cursed. You’re running into architecture debt that Open Mirroring does not solve for you.

Open Mirroring in Microsoft Fabric does exactly what it says on the tin: it replicates data from external systems into OneLake as Delta tables, and schema changes in the source can flow through. That’s huge. It cuts out a pile of ingestion plumbing.

But mirroring only lands data. It does not guarantee your Spark reads will be fast, stable, or predictable. That’s your job.

This post is the practical playbook: what breaks, why it breaks, and the patterns that keep your Spark jobs from turning into slow-motion disasters.

first principle: mirrored is a landing zone, not a serving layer

Treat mirrored tables like an airport runway. Planes touch down there. People do not set up a picnic on the tarmac.

When teams read mirrored tables directly in hot-path jobs, they inherit whatever file layout the connector produced. Sometimes that layout is fine. Sometimes it is a junk drawer.

Spark is sensitive to this. Reading many tiny files creates scheduling and metadata overhead. Reading a few huge files kills parallelism. Either way, the cluster burns time doing the wrong work.

The fix is boring and absolutely worth it: add a curated read layer.

  1. Let Open Mirroring write into a dedicated mirror lakehouse.
  2. Run a post-mirror notebook that reshapes data for Spark (partitioning, compaction, cleanup).
  3. Have production notebooks read curated tables only.

One extra hop. Much better nights of sleep.

what actually causes the latency cliff

Two things usually punch you in the face at scale:

  • File layout drift
  • Schema drift

Let’s tackle them in order.

1) file layout drift (the silent killer)

Spark scheduling is roughly file-driven for Parquet/Delta scans. That means file shape becomes execution shape. If your table has wildly uneven files, your job speed is set by the stragglers.

Think of ten checkout lanes where nine customers have one item and one customer has a full garage sale cart. Everyone waits on that last lane.

Start by measuring file distribution, not just row counts.

from pyspark.sql import functions as F

# NOTE: inputFiles() returns a Python list of file paths
df = spark.read.format("delta").load("Tables/raw_mirrored_orders")
paths = df.inputFiles()

# Use Hadoop FS to get file sizes in bytes
jvm = spark._jvm
conf = spark._jsc.hadoopConfiguration()
fs = jvm.org.apache.hadoop.fs.FileSystem.get(conf)

sizes = []
for p in paths:
    size = fs.getFileStatus(jvm.org.apache.hadoop.fs.Path(p)).getLen()
    sizes.append((p, size))

size_df = spark.createDataFrame(sizes, ["path", "size_bytes"])

size_df.select(
    F.count("*").alias("file_count"),
    F.round(F.avg("size_bytes")/1024/1024, 2).alias("avg_mb"),
    F.round(F.expr("percentile_approx(size_bytes, 0.5)")/1024/1024, 2).alias("p50_mb"),
    F.round(F.expr("percentile_approx(size_bytes, 0.9)")/1024/1024, 2).alias("p90_mb"),
    F.round(F.max("size_bytes")/1024/1024, 2).alias("max_mb")
).show(truncate=False)


You want a tight-ish band, not chaos. A common rule of thumb is targeting roughly 128 MB to 512 MB Parquet files for balanced throughput and parallelism. Rule of thumb, not religion. Your workload decides final tuning.

Then enforce a sane shape in curated tables:

raw = spark.read.format("delta").load("Tables/raw_mirrored_orders")

(raw.write
    .format("delta")
    .mode("overwrite")
    .partitionBy("order_date")         # choose columns your queries actually filter on
    .option("maxRecordsPerFile", 500000)
    .save("Tables/curated_orders"))

spark.sql("OPTIMIZE delta.`Tables/curated_orders`")


If your queries filter by date and region, but you partition by customer_id because it “felt right,” you built a latency trap with your own hands.

2) schema drift (the 3 a.m. pager)

Open Mirroring can propagate source schema changes. That’s useful and dangerous.

Useful because your lake stays aligned. Dangerous because downstream logic often assumes a fixed shape.

A nullable column addition is usually fine. A type shift on a key metric column can quietly corrupt aggregations or explode at runtime.

Do not “notice this later.” Gate on it.

from pyspark.sql.types import StructType
import json

# Store baseline schema as JSON in Files/schemas/orders_baseline.json
with open("/lakehouse/default/Files/schemas/orders_baseline.json", "r") as f:
    baseline = StructType.fromJson(json.load(f))

current = spark.read.format("delta").load("Tables/raw_mirrored_orders").schema

base = {f.name: str(f.dataType) for f in baseline.fields}
curr = {f.name: str(f.dataType) for f in current.fields}

type_changes = [
    f"{name}: {base[name]} -> {curr[name]}"
    for name in curr
    if name in base and base[name] != curr[name]
]

new_cols = [name for name in curr if name not in base]

if type_changes:
    raise ValueError(f"Schema type changes detected: {type_changes}")

# Optional policy: allow new nullable columns but log them
if new_cols:
    print(f"New columns detected: {new_cols}")


Policy matters more than code here. Decide in advance what is auto-accepted versus what blocks the pipeline. Write it down. Enforce it every run.

lag is real, even when everything is healthy

Mirroring pipelines are replication systems, not teleportation devices. There is always some delay between source commit and mirrored availability. Sometimes tiny. Sometimes not.

If your job blindly processes “last hour” windows without checking mirror freshness, you’ll create holes and call them “data quality issues” three weeks later.

Add a freshness gate before transformations. The metadata source is connector-specific, but the pattern is universal:

from datetime import datetime, timedelta, timezone

# Example only: use the metadata table/view exposed by your mirroring setup
last_mirror_ts = spark.sql("""
  SELECT max(replication_commit_ts) as ts
  FROM mirror_metadata.orders_status
""").collect()[0]["ts"]

required_freshness = datetime.now(timezone.utc) - timedelta(minutes=15)

if last_mirror_ts is None or last_mirror_ts < required_freshness:
    raise RuntimeError(
        f"Mirror not fresh enough. Last commit: {last_mirror_ts}, required after: {required_freshness}"
    )


No freshness, no run. That one line saves you from publishing confident nonsense.

the production checklist (use this before go-live)

Before promoting any mirrored-data Spark pipeline, run this checklist in the same capacity and schedule window as production:

  • File shape check
  • Measure file count and distribution (p50, p90, max).
  • If distribution is ugly, compact and rewrite in curated.
  • Partition sanity check
  • Confirm partitions match real filter predicates.
  • Use df.explain(True) and verify PartitionFilters is not empty for common queries.
  • Schema gate check
  • Compare current schema to baseline.
  • Fail on type changes unless explicitly approved.
  • Freshness gate check
  • Validate mirrored data is fresh enough for your downstream SLA.
  • Fail fast if not.
  • Throughput reality check
  • Time representative full and filtered scans from curated tables.
  • If runtime misses SLA, fix layout first, then tune compute.

If you only do one thing from this list, do the curated layer. Direct reads from mirrored tables are the root of most performance horror stories.

architecture that holds up when volume gets ugly

Keep it simple:

  1. Mirror layer
    Open Mirroring lands source data in OneLake Delta tables. This is your raw replica.

  2. Curation job
    A scheduled Spark notebook validates schema, reshapes partitions, and compacts files.

  3. Curated layer
    Downstream Spark notebooks and SQL consumers read here, not from mirror tables.

  4. Freshness gate
    Every downstream run checks replication freshness before processing.

That’s it. No heroics. No mystery knobs. Just a clean boundary between “data landed” and “data is ready to serve.”

Open Mirroring is genuinely powerful, but it is not magic. If you treat mirrored tables as production-ready serving tables, latency will eventually kneecap you. If you treat them as a landing zone and curate aggressively, Spark behaves, stakeholders stay calm, and your weekends stay yours.

This post was written with help from anthropic/claude-opus-4-6

Optimizing Spark Performance with the Native Execution Engine (NEE) in Microsoft Fabric

Spark tuning often starts with the usual suspects (shuffle volume, skew, join strategy, caching)… but sometimes the biggest win is simply executing the same logical plan on a faster engine.

Microsoft Fabric’s Native Execution Engine (NEE) does exactly that: it keeps Spark’s APIs and control plane, but runs a large portion of Spark SQL / DataFrame execution on a vectorized C++ engine.

What NEE is (and why it’s fast)

NEE is a vectorized native engine that integrates into Fabric Spark and can accelerate many SQL/DataFrame operators without you rewriting your code.

  • You still write Spark SQL / DataFrames.
  • Spark still handles distributed execution and scheduling.
  • For supported operators, compute is offloaded to a native engine (reducing JVM overhead and using columnar/vectorized execution).

Fabric documentation calls out NEE as being based on Apache Gluten (the Spark-to-native glue layer) and Velox (the native execution library).

When NEE tends to help the most

NEE shines when your workload is:

  • SQL-heavy (joins, aggregates, projections, filters)
  • CPU-bound (compute dominates I/O)
  • Primarily on Parquet / Delta

You’ll see less benefit (or fallback) when you rely on features NEE doesn’t support yet.

How to enable NEE (3 practical options)

1) Environment-level toggle (recommended for teams)

In your Fabric Environment settings, go to Acceleration and enable the native execution engine, then Save + Publish.

Benefit: notebooks and Spark Job Definitions that use that environment inherit the setting automatically.

2) Enable for a single notebook / job via Spark config

In a notebook cell:

%%configure
{
  "conf": {
    "spark.native.enabled": "true"
  }
}

For Spark Job Definitions, add the same Spark property.

3) Disable/enable per-query when you hit unsupported features

If a specific query uses an unsupported operator/expression and you want to force JVM Spark for that query:

SET spark.native.enabled=FALSE;
-- run the query
SET spark.native.enabled=TRUE;

How to confirm NEE is actually being used

Two low-friction checks:

  1. Spark UI / History Server: look for plan nodes ending with Transformer or nodes like *NativeFileScan / VeloxColumnarToRowExec.
  2. df.explain(): the same Transformer / NativeFileScan / Velox… hints should appear in the plan.

Fabric also exposes a dedicated view (“Gluten SQL / DataFrame”) to help spot which queries ran on the native engine vs. fell back.

Fallback is a feature (but you should know the common triggers)

NEE includes an automatic fallback mechanism: if the plan contains unsupported features, Spark will run that portion on the JVM engine.

A few notable limitations called out in Fabric documentation:

  • UDFs aren’t supported (fallback)
  • Structured streaming isn’t supported (fallback)
  • File formats like CSV/JSON/XML aren’t accelerated
  • ANSI mode isn’t supported

There are also some behavioral differences worth remembering (rounding/casting edge cases) if you have strict numeric expectations.

A pragmatic “NEE-first” optimization workflow

  1. Turn NEE on for the environment (or your job) and rerun the workload.
  2. If it’s still slow, open the plan and answer: is the slow part running on the native engine, or did it fall back?
  3. If it fell back, make the smallest possible change to keep the query on the native path (e.g., avoid UDFs; prefer built-in expressions; standardize on Parquet/Delta).
  4. Once the plan stays mostly native, go back to classic Spark tuning: reduce shuffle volume, fix skew, sane partitioning, and confirm broadcast joins.

References

This post was written with help from ChatGPT 5.2

Understanding Spark Execution in Microsoft Fabric

Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime.

This post is a quick, practical refresher on the Spark execution model — with Fabric-specific pointers on where to observe jobs, stages, and tasks.

1) The execution hierarchy: Application → Job → Stage → Task

In Spark, your code runs as a Spark application. When you run an action (for example, count(), collect(), or writing a table), Spark submits a job. Each job is broken into stages, and each stage runs a set of tasks (often one task per partition).

A useful mental model:

  • Tasks are the unit of parallel work.
  • Stages group tasks that can run together without needing data from another stage.
  • Stage boundaries often show up where a shuffle is required (wide dependencies like joins and aggregations).

2) Lazy evaluation: why “nothing happens” until an action

Most DataFrame / Spark SQL transformations are lazy. Spark builds a plan and only executes when an action forces it.

Example (PySpark):

from pyspark.sql.functions import col

df = spark.read.table("fact_sales")
# Transformations (lazy)
filtered = df.filter(col("sale_date") >= "2026-01-01")

# Action (executes)
print(filtered.count())


This matters in Fabric notebooks because a single cell can trigger multiple jobs (for example, one job to materialize a cache and another to write output).

3) Shuffles: the moment your DAG turns expensive

A shuffle is when data must be redistributed across executors (typically by key). Shuffles introduce:

  • network transfer
  • disk I/O (shuffle files)
  • spill risk (memory pressure)
  • skew/stragglers (a few hot partitions dominate)

If you’re diagnosing a slow pipeline, assume a shuffle is the culprit until proven otherwise.

4) What to check in Fabric: jobs, stages, tasks

Fabric gives you multiple ways to see execution progress:

  • Notebook contextual monitoring: a progress indicator for notebook cells, with stage/task progress.
  • Spark monitoring / detail monitoring: drill into a Spark application and see jobs, stages, tasks, and duration breakdowns.

When looking at a slow run, focus on:

  • stages with large shuffle read/write
  • long-tail tasks (stragglers)
  • spill metrics (memory → disk)
  • skew indicators (a few tasks far slower than the median)

5) A repeatable debugging workflow (that scales)

  1. Start with the plandf.explain(True) for DataFrame/Spark SQL
    • Look for Exchange operators (shuffle) and join strategies (broadcast vs shuffle join)
  2. Run once, then open monitoringIdentify the longest stage(s)
    • Confirm whether it’s CPU-bound, shuffle-bound, or spill-bound
  3. Apply the common fixes in orderAvoid the shuffle (broadcast small dims)
    • Reduce shuffle volume (filter early, select only needed columns)
    • Fix partitioning (repartition by join keys; avoid extreme partition counts)
    • Turn on AQE (spark.sql.adaptive.enabled=true) to let Spark coalesce shuffle partitions and mitigate skew

Quick checklist

  • Do I know which stage is dominating runtime?
  • Is there an Exchange / shuffle boundary causing it?
  • Are a few tasks straggling (skew), or are all tasks uniformly slow?
  • Am I broadcasting what should be broadcast?
  • Is AQE enabled, and is it actually taking effect?

References

This post was written with help from ChatGPT 5.2