Understanding Spark Execution in Microsoft Fabric

Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime.

This post is a quick, practical refresher on the Spark execution model — with Fabric-specific pointers on where to observe jobs, stages, and tasks.

1) The execution hierarchy: Application → Job → Stage → Task

In Spark, your code runs as a Spark application. When you run an action (for example, count(), collect(), or writing a table), Spark submits a job. Each job is broken into stages, and each stage runs a set of tasks (often one task per partition).

A useful mental model:

Tasks are the unit of parallel work.
Stages group tasks that can run together without needing data from another stage.
Stage boundaries often show up where a shuffle is required (wide dependencies like joins and aggregations).

2) Lazy evaluation: why “nothing happens” until an action

Most DataFrame / Spark SQL transformations are lazy. Spark builds a plan and only executes when an action forces it.

Example (PySpark):

from pyspark.sql.functions import col  df = spark.read.table("fact_sales") # Transformations (lazy) filtered = df.filter(col("sale_date") >= "2026-01-01")  # Action (executes) print(filtered.count())

This matters in Fabric notebooks because a single cell can trigger multiple jobs (for example, one job to materialize a cache and another to write output).

3) Shuffles: the moment your DAG turns expensive

A shuffle is when data must be redistributed across executors (typically by key). Shuffles introduce:

network transfer
disk I/O (shuffle files)
spill risk (memory pressure)
skew/stragglers (a few hot partitions dominate)

If you’re diagnosing a slow pipeline, assume a shuffle is the culprit until proven otherwise.

4) What to check in Fabric: jobs, stages, tasks

Fabric gives you multiple ways to see execution progress:

Notebook contextual monitoring: a progress indicator for notebook cells, with stage/task progress.
Spark monitoring / detail monitoring: drill into a Spark application and see jobs, stages, tasks, and duration breakdowns.

When looking at a slow run, focus on:

stages with large shuffle read/write
long-tail tasks (stragglers)
spill metrics (memory → disk)
skew indicators (a few tasks far slower than the median)

5) A repeatable debugging workflow (that scales)

Start with the plandf.explain(True) for DataFrame/Spark SQL
- Look for Exchange operators (shuffle) and join strategies (broadcast vs shuffle join)
Run once, then open monitoringIdentify the longest stage(s)
- Confirm whether it’s CPU-bound, shuffle-bound, or spill-bound
Apply the common fixes in orderAvoid the shuffle (broadcast small dims)
- Reduce shuffle volume (filter early, select only needed columns)
- Fix partitioning (repartition by join keys; avoid extreme partition counts)
- Turn on AQE (spark.sql.adaptive.enabled=true) to let Spark coalesce shuffle partitions and mitigate skew

Quick checklist

Do I know which stage is dominating runtime?
Is there an Exchange / shuffle boundary causing it?
Are a few tasks straggling (skew), or are all tasks uniformly slow?
Am I broadcasting what should be broadcast?
Is AQE enabled, and is it actually taking effect?

References

This post was written with help from ChatGPT 5.2