
Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime.
This post is a quick, practical refresher on the Spark execution model — with Fabric-specific pointers on where to observe jobs, stages, and tasks.
1) The execution hierarchy: Application → Job → Stage → Task
In Spark, your code runs as a Spark application. When you run an action (for example, count(), collect(), or writing a table), Spark submits a job. Each job is broken into stages, and each stage runs a set of tasks (often one task per partition).
A useful mental model:
- Tasks are the unit of parallel work.
- Stages group tasks that can run together without needing data from another stage.
- Stage boundaries often show up where a shuffle is required (wide dependencies like joins and aggregations).
2) Lazy evaluation: why “nothing happens” until an action
Most DataFrame / Spark SQL transformations are lazy. Spark builds a plan and only executes when an action forces it.
Example (PySpark):
from pyspark.sql.functions import col
df = spark.read.table("fact_sales")
# Transformations (lazy)
filtered = df.filter(col("sale_date") >= "2026-01-01")
# Action (executes)
print(filtered.count())
This matters in Fabric notebooks because a single cell can trigger multiple jobs (for example, one job to materialize a cache and another to write output).
3) Shuffles: the moment your DAG turns expensive
A shuffle is when data must be redistributed across executors (typically by key). Shuffles introduce:
- network transfer
- disk I/O (shuffle files)
- spill risk (memory pressure)
- skew/stragglers (a few hot partitions dominate)
If you’re diagnosing a slow pipeline, assume a shuffle is the culprit until proven otherwise.
4) What to check in Fabric: jobs, stages, tasks
Fabric gives you multiple ways to see execution progress:
- Notebook contextual monitoring: a progress indicator for notebook cells, with stage/task progress.
- Spark monitoring / detail monitoring: drill into a Spark application and see jobs, stages, tasks, and duration breakdowns.
When looking at a slow run, focus on:
- stages with large shuffle read/write
- long-tail tasks (stragglers)
- spill metrics (memory → disk)
- skew indicators (a few tasks far slower than the median)
5) A repeatable debugging workflow (that scales)
- Start with the plan
df.explain(True)for DataFrame/Spark SQL- Look for
Exchangeoperators (shuffle) and join strategies (broadcast vs shuffle join)
- Look for
- Run once, then open monitoringIdentify the longest stage(s)
- Confirm whether it’s CPU-bound, shuffle-bound, or spill-bound
- Apply the common fixes in orderAvoid the shuffle (broadcast small dims)
- Reduce shuffle volume (filter early, select only needed columns)
- Fix partitioning (repartition by join keys; avoid extreme partition counts)
- Turn on AQE (
spark.sql.adaptive.enabled=true) to let Spark coalesce shuffle partitions and mitigate skew
Quick checklist
- Do I know which stage is dominating runtime?
- Is there an
Exchange/ shuffle boundary causing it? - Are a few tasks straggling (skew), or are all tasks uniformly slow?
- Am I broadcasting what should be broadcast?
- Is AQE enabled, and is it actually taking effect?
References
- Microsoft Fabric — Notebook contextual monitoring and debugging
- Microsoft Fabric — Apache Spark application detail monitoring
- Apache Spark — Job Scheduling
- Apache Spark — SQL Performance Tuning (AQE + shuffle partitions)
This post was written with help from ChatGPT 5.2
