Understanding Spark Execution in Microsoft Fabric

Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime.

This post is a quick, practical refresher on the Spark execution model — with Fabric-specific pointers on where to observe jobs, stages, and tasks.

1) The execution hierarchy: Application → Job → Stage → Task

In Spark, your code runs as a Spark application. When you run an action (for example, count(), collect(), or writing a table), Spark submits a job. Each job is broken into stages, and each stage runs a set of tasks (often one task per partition).

A useful mental model:

  • Tasks are the unit of parallel work.
  • Stages group tasks that can run together without needing data from another stage.
  • Stage boundaries often show up where a shuffle is required (wide dependencies like joins and aggregations).

2) Lazy evaluation: why “nothing happens” until an action

Most DataFrame / Spark SQL transformations are lazy. Spark builds a plan and only executes when an action forces it.

Example (PySpark):

from pyspark.sql.functions import col

df = spark.read.table("fact_sales")
# Transformations (lazy)
filtered = df.filter(col("sale_date") >= "2026-01-01")

# Action (executes)
print(filtered.count())


This matters in Fabric notebooks because a single cell can trigger multiple jobs (for example, one job to materialize a cache and another to write output).

3) Shuffles: the moment your DAG turns expensive

A shuffle is when data must be redistributed across executors (typically by key). Shuffles introduce:

  • network transfer
  • disk I/O (shuffle files)
  • spill risk (memory pressure)
  • skew/stragglers (a few hot partitions dominate)

If you’re diagnosing a slow pipeline, assume a shuffle is the culprit until proven otherwise.

4) What to check in Fabric: jobs, stages, tasks

Fabric gives you multiple ways to see execution progress:

  • Notebook contextual monitoring: a progress indicator for notebook cells, with stage/task progress.
  • Spark monitoring / detail monitoring: drill into a Spark application and see jobs, stages, tasks, and duration breakdowns.

When looking at a slow run, focus on:

  • stages with large shuffle read/write
  • long-tail tasks (stragglers)
  • spill metrics (memory → disk)
  • skew indicators (a few tasks far slower than the median)

5) A repeatable debugging workflow (that scales)

  1. Start with the plandf.explain(True) for DataFrame/Spark SQL
    • Look for Exchange operators (shuffle) and join strategies (broadcast vs shuffle join)
  2. Run once, then open monitoringIdentify the longest stage(s)
    • Confirm whether it’s CPU-bound, shuffle-bound, or spill-bound
  3. Apply the common fixes in orderAvoid the shuffle (broadcast small dims)
    • Reduce shuffle volume (filter early, select only needed columns)
    • Fix partitioning (repartition by join keys; avoid extreme partition counts)
    • Turn on AQE (spark.sql.adaptive.enabled=true) to let Spark coalesce shuffle partitions and mitigate skew

Quick checklist

  • Do I know which stage is dominating runtime?
  • Is there an Exchange / shuffle boundary causing it?
  • Are a few tasks straggling (skew), or are all tasks uniformly slow?
  • Am I broadcasting what should be broadcast?
  • Is AQE enabled, and is it actually taking effect?

References

This post was written with help from ChatGPT 5.2