Sparkwise: an “automated data engineering specialist” for Fabric Spark tuning

Spark tuning has a way of chewing up time: you start with something that “should be fine,” performance is off, costs creep up, and suddenly you’re deep in configs, Spark UI, and tribal knowledge trying to figure out what actually matters.

That’s why I’m excited to highlight sparkwise, an open-source Python package created by Santhosh Kumar Ravindran, one of my direct reports here at Microsoft. Santhosh built sparkwise to make Spark optimization in Microsoft Fabric less like folklore and more like a repeatable workflow: automated diagnostics, session profiling, and actionable recommendations to help teams drive better price-performance without turning every run into an investigation.

If you’ve ever thought, “I know something’s wrong, but I can’t quickly prove what to change,” sparkwise is aimed squarely at that gap. (PyPI)

As of January 5, 2026, the latest release is sparkwise 1.4.2 on PyPI. (PyPI)


The core idea: stop guessing, start diagnosing

Spark tuning often fails for two reasons:

  1. Too many knobs (Spark, Delta, Fabric-specific settings, runtime behavior).
  2. Not enough feedback (it’s hard to translate symptoms into the few changes that actually matter).

sparkwise attacks both.

It positions itself as an “automated Data Engineering specialist for Apache Spark on Microsoft Fabric,” offering:

  • Intelligent diagnostics
  • Configuration recommendations
  • Comprehensive session profiling
    …so you can get to the best price/performance outcome without turning every notebook run into a science project. (PyPI)

Why sparkwise exists (and the problems it explicitly targets)

From the project description, sparkwise focuses on the stuff that reliably burns time and money in real Fabric Spark work:

  • Cost optimization: detect configurations that waste capacity and extend runtime (PyPI)
  • Performance optimization: validate and enable Fabric-specific acceleration paths like Native Engine and resource profiles (PyPI)
  • Faster iteration: detect Starter Pool blockers that force slower cold starts (3–5 minutes is called out directly) (PyPI)
  • Learning & clarity: interactive Q&A across 133 Spark/Delta/Fabric configurations (PyPI)
  • Workload understanding: profiling across sessions, executors, jobs, and resources (PyPI)
  • Decision support: priority-ranked recommendations with impact analysis (PyPI)

If you’ve ever thought “I know something is off, but I can’t prove which change matters,” this is aimed squarely at you.


What you get: a feature tour that maps to real-world Spark pain

sparkwise’s feature set is broad, but it’s not random. It clusters nicely into a few “jobs to be done.”

1) Automated diagnostics (the fast “what’s wrong?” pass)

The diagnostics layer checks a bunch of high-impact areas, including:

  • Native Execution Engine: verifies Velox usage and detects fallbacks to row-based processing (PyPI)
  • Spark compute: analyzes Starter vs Custom Pool usage and flags immutable configs (PyPI)
  • Data skew detection: identifies imbalanced task distributions (PyPI)
  • Delta optimizations: checks V-Order, deletion vectors, optimize write, auto compaction (PyPI)
  • Runtime tuning: validates AQE, partition sizing, scheduler mode (PyPI)

This is the stuff that tends to produce outsized wins when it’s wrong.

2) Comprehensive profiling (the “what actually happened?” pass)

Once you’re past basic correctness, the next level is: where did time and resources go?

sparkwise includes profiling across:

  • session metadata and resource allocation
  • executor status and memory utilization
  • job/stage/task metrics and bottleneck detection
  • resource efficiency scoring and utilization analysis (PyPI)

3) Advanced performance analysis (built on real metrics)

One of the most interesting “newer” directions in sparkwise is leaning into actual observed execution metrics:

  • “Real metrics collection” using Spark stage/task data (vs estimates) (PyPI)
  • scalability prediction comparing Starter vs Custom Pool with vCore-hour calculations (PyPI)
  • stage timeline visualization (parallel vs sequential patterns) (PyPI)
  • efficiency analysis that quantifies wasted compute in vCore-hours (PyPI)

That’s the bridge between “it feels slow” and “here’s the measurable waste + the fix.”

4) Advanced skew detection (because skew kills Spark)

Skew is one of those problems that can hide behind averages and ruin everything anyway.

sparkwise’s skew tooling includes:

  • straggler detection via task duration variance (PyPI)
  • partition-level analysis with statistical metrics (PyPI)
  • skewed join detection with mitigation suggestions (broadcast vs salting strategies) (PyPI)
  • automatic mitigation guidance with code examples (salting, AQE, broadcast) (PyPI)

5) SQL query plan analysis (spotting anti-patterns early)

For teams living in Spark SQL / DataFrames, this is huge:

  • anti-pattern detection (cartesian products, full scans, excessive shuffles) (PyPI)
  • Native Engine compatibility checks (PyPI)
  • Z-Order recommendations based on cardinality (PyPI)
  • caching opportunity detection for repeated scans/subqueries (PyPI)

6) Storage optimization suite (new in v1.4.0+)

This is one of the clearest “practical ops” expansions:

  • small file detection for Delta tables (default threshold is configurable; example shows <10MB) (PyPI)
  • VACUUM ROI calculator using OneLake pricing assumptions in the project docs (PyPI)
  • partition effectiveness analysis and over/under-partitioning detection (PyPI)
  • “run all storage checks in one command” workflows (PyPI)

In other words: not just “your table is messy,” but “here’s why it costs you, and what to do.”

7) Interactive configuration assistant (the “what does this do?” superpower)

This is deceptively valuable. sparkwise provides:

  • Q&A for 133 documented configurations spanning Spark, Delta, Fabric-specific settings (and Runtime 1.2 configs are called out) (PyPI)
  • context-aware guidance with workload-specific recommendations (PyPI)
  • explicit support for Fabric resource profiles (writeHeavy, readHeavyForSpark, readHeavyForPBI) (PyPI)
  • keyword search across config knowledge (PyPI)

This is the difference between “go read 9 docs” and “ask one question and move on.”


Quick start: the 3 fastest ways to get value

Install

pip install sparkwise

(PyPI)

1) Run a full diagnostic on your current session

from sparkwise import diagnose

diagnose.analyze()

(PyPI)

2) Ask about a specific Spark/Fabric config

from sparkwise import ask

ask.config("spark.native.enabled")
ask.search("optimize")

(PyPI)

3) Profile your run (and pinpoint bottlenecks)

from sparkwise import (
    profile, profile_executors, profile_jobs, profile_resources,
    predict_scalability, show_timeline, analyze_efficiency
)

profile()
profile_executors()
profile_jobs()
profile_resources()

predict_scalability()
show_timeline()
analyze_efficiency()

(PyPI)


CLI workflows (especially useful for storage optimization)

If you prefer CLIs (or want repeatable checks in scripts), sparkwise includes commands like:

sparkwise storage analyze Tables/mytable
sparkwise storage small-files Tables/mytable --threshold 10
sparkwise storage vacuum-roi Tables/mytable --retention-hours 168
sparkwise storage partitions Tables/mytable

(PyPI)

That’s a clean “ops loop” for keeping Delta tables healthy.


A realistic “first hour” workflow I’d recommend

If you’re trying sparkwise on a real Fabric notebook today, here’s a practical order of operations:

  1. Run diagnose.analyze() first
    Use it as your “triage” to catch the high-impact misconfigs (Native Engine fallback, AQE off, Starter Pool blockers). (PyPI)
  2. Use ask.config() for any red/yellow item you don’t fully understand
    The point is speed: read the explanation in context and decide. (PyPI)
  3. Profile the session
    If the job is still slow/expensive after obvious fixes, profile and look for the real culprit: skew, shuffle pressure, poor parallelism, memory pressure. (PyPI)
  4. If the job smells like skew, use advanced skew detection
    Especially for joins and wide aggregations. (PyPI)
  5. If your tables are growing, run storage analysis early
    Small files and weak partitioning quietly tax everything downstream. (PyPI)

That flow is how you turn “tuning” from an art project into a checklist.


Closing: why this matters for Fabric teams

I’m amplifying sparkwise because it’s the kind of contribution that scales beyond the person who wrote it. Santhosh took hard-earned, real-world Fabric Spark tuning experience and turned it into something other engineers can use immediately — a practical way to spot waste, unblock faster iteration, and make smarter performance tradeoffs.

If your team runs Fabric Spark workloads regularly, treat sparkwise like a lightweight tuning partner:

  1. install it,
  2. run the diagnostics,
  3. act on one recommendation,
  4. measure the improvement,
  5. repeat.

And if you end up with feedback or feature ideas, even better — that’s how tools like this get sharper and more broadly useful.

This post was written with help from ChatGPT 5.2

Leave a comment