If your Lakehouse tables are getting slower (or more expensive) over time, it’s often not “Spark is slow.” It’s usually table layout drift: too many small files, suboptimal clustering, and old files piling up.
In Fabric Lakehouse, the three table-maintenance levers you’ll reach for most are:
OPTIMIZE: compacts many small files into fewer, larger files (and can apply clustering)
Z-ORDER: co-locates related values to improve data skipping for common filters
VACUUM: deletes old files that are no longer referenced by the Delta transaction log (after a retention window)
Practical note: in Fabric, run these as Spark SQL in a notebook or Spark job definition (or use the Lakehouse maintenance UI). Don’t try to run them in the SQL Analytics Endpoint.
1) Start with the symptom: “small files” vs “bad clustering”
Before you reach for maintenance, quickly sanity-check what you’re fighting:
Many small files → queries spend time opening/reading lots of tiny Parquet files.
Poor clustering for your most common predicates (date, tenantId, customerId, region, etc.) → queries scan more data than they need.
Heavy UPDATE/DELETE/MERGE patterns → lots of new files + tombstones + time travel files.
If you only have small files, OPTIMIZE is usually your first win.
2) OPTIMIZE: bin-packing for fewer, bigger files
Basic compaction
OPTIMIZE my_table;
Target a subset (example: recent partitions)
OPTIMIZE my_table WHERE date >= date_sub(current_date(), 7);
A useful mental model: OPTIMIZE is rewriting file layout (not changing table results). It’s maintenance, not transformation.
3) Z-ORDER: make your filters cheaper
Z-Ordering is for the case where you frequently query:
WHERE tenantId = ...
WHERE customerId = ...
WHERE deviceId = ... AND eventTime BETWEEN ...
Example:
OPTIMIZE my_table ZORDER BY (tenantId, eventDate);
Pick 1–3 columns that dominate your interactive workloads. If you try to z-order on everything, you’ll mostly burn compute for little benefit.
4) VACUUM: clean up old, unreferenced files (carefully)
VACUUM is about storage hygiene. Delta keeps old files around to support time travel and concurrent readers. VACUUM deletes files that are no longer referenced and older than the configured retention threshold.
VACUUM my_table;
Two practical rules:
Don’t VACUUM aggressively unless you understand the impact on time travel / rollback.
Treat the retention window as a governance decision (what rollback window do you want?) not just a cost optimization.
5) Fabric-specific gotchas (the ones that actually bite)
Where you can run these commands
These are Spark SQL maintenance commands. In Fabric, that means notebooks / Spark job definitions (or the Lakehouse maintenance UI), not the SQL Analytics Endpoint.
V-Order and OPTIMIZE
Fabric also has V-Order, which is a Parquet layout optimization aimed at faster reads across Fabric engines. If you’re primarily optimizing for downstream read performance (Power BI/SQL/Spark), it’s worth understanding whether V-Order is enabled for your workspace and table writes.
If your Fabric tenant has grown past “a handful of workspaces,” the problem isn’t just storage or compute—it’s finding the right items, understanding what they are, and making governance actionable.
That’s the motivation behind the OneLake catalog: a central hub to discover and manage Fabric content, with dedicated experiences for discovery (Explore), governance posture (Govern), and security administration (Secure).
This post is a practical walk-through of what’s available today, with extra focus on what Fabric admins get in the Govern experience.
What is the OneLake catalog?
Microsoft describes the OneLake catalog as a centralized place to find, explore, and use Fabric items—and to govern the data you own.
You open it from the Fabric navigation pane by selecting the OneLake icon.
Explore tab: tenant-wide discovery without losing context
The Explore tab is the “inventory + details” experience:
An items list of Fabric content you can access (and in some cases, content you can request access to).
An in-context details pane so you can inspect an item without navigating away from your filtered list.
Filters and selectors to narrow scope (for example: workspace, item-type categories, endorsement, and tags).
A key pattern here is fast triage: filter down to a domain/workspace, then click through items to answer:
Who owns this?
Where does it live?
When was it refreshed?
Is it endorsed/certified?
Does it have sensitivity labeling?
Tip for data engineers
If your tenant uses domains, scoping the catalog to a domain/subdomain is often the quickest way to keep the item list meaningful—especially when teams create similar notebooks/pipelines across many workspaces.
Govern tab: governance posture + recommended actions
The Govern tab is where the catalog becomes more than “a directory.” It combines:
Insights (high-level indicators you can drill into)
The Govern tab behaves differently depending on who you are:
Fabric admins see insights based on tenant metadata (items, workspaces, capacities, domains).
Data owners see insights scoped to items they own (using the My items concept).
The Fabric blog also calls out a preview experience that extends the OneLake catalog governance view for Fabric admins, providing consolidated indicators and deeper drill-down reporting.
What admins see on the Govern tab
From the Fabric admin perspective, the Govern experience is designed to answer:
What does our data estate look like (inventory, distribution, usage)?
Where are we under-labeled or non-compliant (sensitivity coverage, policy posture)?
What content is hard to trust or reuse (freshness, endorsement/description/tag coverage, sharing patterns)?
When admins choose View more, Learn documentation describes an expanded report with three areas:
Manage your data estate (inventory, capacities/domains, feature usage)
Protect, secure & comply (sensitivity label coverage and data loss prevention policy posture)
Discover, trust, and reuse (freshness, curation signals such as endorsement/description coverage, sharing)
A detail worth knowing: refresh cadence differs for admins
Per Microsoft Learn, admin insights and actions are based on Admin Monitoring Storage data and refresh automatically every day, so there can be a lag between changes you make and what the Govern insights reflect.
Secure tab: centralized security role management
The OneLake catalog Secure tab is a security administration surface that centralizes:
Workspace roles and permissions (for auditing access)
OneLake security roles across workspaces and item types
From the Secure tab, admins can create, edit, or delete OneLake security roles from a single location.
A practical workflow to adopt (teams + admins)
Here’s a lightweight approach that scales better than “ask around on Teams”:
Explore: Use domain/workspace scoping + filters to find candidate items.
Inspect: Use the in-context details pane to sanity-check ownership, endorsement, sensitivity, and freshness.
Govern: Use the recommended actions cards to drive a small number of measurable improvements:
increase sensitivity label coverage
improve endorsement/certification where appropriate
standardize descriptions/tags for key assets
Secure: Audit role sprawl and standardize how OneLake security roles are managed across items.
Considerations and limitations to keep in mind
A few constraints called out in Learn documentation (useful when you’re setting expectations):
The Govern tab doesn’t support cross-tenant scenarios or guest users.
The Govern tab isn’t available when Private Link is activated.
Govern insights for admins can be up to a day behind due to daily refresh of admin monitoring storage.
Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime.
This post is a quick, practical refresher on the Spark execution model — with Fabric-specific pointers on where to observe jobs, stages, and tasks.
In Spark, your code runs as a Spark application. When you run an action (for example, count(), collect(), or writing a table), Spark submits a job. Each job is broken into stages, and each stage runs a set of tasks (often one task per partition).
A useful mental model:
Tasks are the unit of parallel work.
Stages group tasks that can run together without needing data from another stage.
Stage boundaries often show up where a shuffle is required (wide dependencies like joins and aggregations).
2) Lazy evaluation: why “nothing happens” until an action
Most DataFrame / Spark SQL transformations are lazy. Spark builds a plan and only executes when an action forces it.
Example (PySpark):
from pyspark.sql.functions import col df = spark.read.table("fact_sales") # Transformations (lazy) filtered = df.filter(col("sale_date") >= "2026-01-01") # Action (executes) print(filtered.count())
This matters in Fabric notebooks because a single cell can trigger multiple jobs (for example, one job to materialize a cache and another to write output).
3) Shuffles: the moment your DAG turns expensive
A shuffle is when data must be redistributed across executors (typically by key). Shuffles introduce:
network transfer
disk I/O (shuffle files)
spill risk (memory pressure)
skew/stragglers (a few hot partitions dominate)
If you’re diagnosing a slow pipeline, assume a shuffle is the culprit until proven otherwise.
4) What to check in Fabric: jobs, stages, tasks
Fabric gives you multiple ways to see execution progress:
Notebook contextual monitoring: a progress indicator for notebook cells, with stage/task progress.
Spark monitoring / detail monitoring: drill into a Spark application and see jobs, stages, tasks, and duration breakdowns.
When looking at a slow run, focus on:
stages with large shuffle read/write
long-tail tasks (stragglers)
spill metrics (memory → disk)
skew indicators (a few tasks far slower than the median)
5) A repeatable debugging workflow (that scales)
Start with the plandf.explain(True) for DataFrame/Spark SQL
Look for Exchange operators (shuffle) and join strategies (broadcast vs shuffle join)
Run once, then open monitoringIdentify the longest stage(s)
Confirm whether it’s CPU-bound, shuffle-bound, or spill-bound
Apply the common fixes in orderAvoid the shuffle (broadcast small dims)
Reduce shuffle volume (filter early, select only needed columns)
Fix partitioning (repartition by join keys; avoid extreme partition counts)
Turn on AQE (spark.sql.adaptive.enabled=true) to let Spark coalesce shuffle partitions and mitigate skew
Quick checklist
Do I know which stage is dominating runtime?
Is there an Exchange / shuffle boundary causing it?
Are a few tasks straggling (skew), or are all tasks uniformly slow?
Shuffles are where Spark jobs go to get expensive: a wide join or aggregation forces data to move across the network, materialize shuffle files, and often spill when memory pressure spikes.
In Microsoft Fabric Spark workloads, the fastest optimization is usually the boring one: avoid the shuffle when you can, and when you can’t, make it smaller and better balanced.
This post lays out a practical, repeatable approach you can apply in Fabric notebooks and Spark job definitions.
1) Start with the simplest win: avoid the shuffle
If one side of your join is genuinely small (think lookup/dimension tables), use a broadcast join so Spark ships the small table to executors and avoids a full shuffle.
In Fabric’s Spark best practices, Microsoft explicitly calls out broadcast joins for small lookup tables as a way to avoid shuffles entirely.
Example (PySpark):
from pyspark.sql.functions import broadcast
fact = spark.read.table("fact_sales")
dim = spark.read.table("dim_product")
# If dim_product is small enough, broadcast it
joined = fact.join(broadcast(dim), on="product_id", how="left")
If you can’t broadcast safely, move to the next lever.
2) Make the shuffle less painful: tune shuffle parallelism
Spark controls the number of shuffle partitions for joins and aggregations with spark.sql.shuffle.partitions (default: 200 in Spark SQL).
Too few partitions → huge partitions → long tasks, spills, and stragglers.
Too many partitions → tiny tasks → scheduling overhead, excess shuffle metadata, and unnecessary overhead.
A decent heuristic is to start with something proportional to total executor cores and then iterate using the Spark UI (watch stage task durations, shuffle read/write sizes, and spill metrics).
3) Let Spark fix itself (when it can): enable AQE
Adaptive Query Execution (AQE) uses runtime statistics to optimize a query as it runs.
Fabric’s Spark best practices recommend enabling AQE to dynamically optimize shuffle partitions and handle skewed data automatically.
AQE is particularly helpful when:
Your input data distribution changes day-to-day
A static spark.sql.shuffle.partitions value is right for some workloads but wrong for others
You hit skew where a small number of partitions do most of the work
If you’ve adopted Microsoft Fabric, there’s a good chance you’re trying to reduce the number of ‘copies’ of data that exist just so different teams and engines can access it.
OneLake shortcuts are one of the core primitives Fabric provides to unify data across domains, clouds, and accounts by making OneLake a single virtual data lake namespace.
For Spark users specifically, the big win is that shortcuts appear as folders in OneLake—so Spark can read them like any other folder—and Delta-format shortcuts in the Lakehouse Tables area can be surfaced as tables.
What a OneLake shortcut is (and isn’t)
A shortcut is an object in OneLake that points to another storage location (internal or external to OneLake).
Shortcuts appear as folders and behave like symbolic links: deleting a shortcut doesn’t delete the target, but moving/renaming/deleting the target can break the shortcut.
From an engineering standpoint, that means you should treat shortcuts as a namespace mapping layer—not as a durability mechanism.
Where you can create shortcuts: Lakehouse Tables vs Files
In a Lakehouse, you create shortcuts either under the top-level Tables folder or anywhere under the Files folder.
Tables has constraints: OneLake doesn’t support shortcuts in subdirectories of the Tables folder, and shortcuts in Tables are typically meant for targets that conform to the Delta table format.
Files is flexible: there are no restrictions on where you can create shortcuts in the Files hierarchy, and table discovery does not happen there.
If a shortcut in the Tables area points to Delta-format data, the lakehouse can synchronize metadata and recognize the folder as a table.
One documented gotcha: the Delta format doesn’t support table names with space characters, and OneLake won’t recognize any shortcut containing a space in the name as a Delta table.
How Spark reads from shortcuts
In notebooks and Spark jobs, shortcuts appear as folders in OneLake, and Spark can read them like any other folder.
For table-shaped data, Fabric automatically recognizes shortcuts in the Tables section of the lakehouse that have Delta/Parquet data as tables—so you can reference them directly from Spark.
Microsoft Learn also notes you can use relative file paths to read data directly from shortcuts, and Delta shortcuts in Tables can be read via Spark SQL syntax.
Practical patterns (what I recommend in real projects)
Pattern 1: Use Tables shortcuts for shared Delta tables you want to show up consistently across Fabric engines (Spark + SQL + Direct Lake scenarios via semantic models reading from shortcuts).
Pattern 2: Use Files shortcuts when you need arbitrary formats or hierarchical layouts (CSV/JSON/images, nested partitions, etc.) and you’re fine treating it as file access.
Pattern 3: Prefer shortcuts over copying/staging when your primary goal is to eliminate edge copies and reduce latency from data duplication workflows.
Pattern 4: When you’re operationalizing Spark notebooks, make the access path explicit and stable by using the shortcut path (the place it appears) rather than hard-coding a target path that might change.
Operational gotchas and guardrails
Because moving/renaming/deleting a target path can break a shortcut, add lightweight monitoring for “broken shortcut” failures in your pipelines (and treat them like dependency failures).
For debugging, the lakehouse UI can show the ABFS path or URL for a shortcut in its Properties pane, which you can copy for inspection or troubleshooting.
Outside of Fabric, services can access OneLake through the OneLake API, which supports a subset of ADLS Gen2 and Blob storage APIs.
Summary
Shortcuts give Spark a clean way to treat OneLake like a unified namespace: read shortcuts as folders, surface Delta/Parquet data in Tables as tables, and keep your project’s logical paths stable even when physical storage locations vary.
If you’re treating Microsoft Fabric workspaces as source-controlled assets, you’ve probably started leaning on code-first deployment tooling (either Fabric’s built-in Git integration or community tooling layered on top).
One popular option is the open-source fabric-cicd Python library, which is designed to help implement CI/CD automations for Fabric workspaces without having to interact directly with the underlying Fabric APIs.
For most Fabric items, a ‘deploy what’s in Git’ model works well—until you hit a configuration that looks like it’s in source control, appears in deployment logs, but still doesn’t land in the target workspace.
This post walks through a real example from fabric-cicd issue #776: an Environment artifact where the “Enable native execution engine” toggle does not end up enabled after deployment, even though the configuration appears present and the PATCH call returns HTTP 200.
Why this setting matters: environments are the contract for Spark compute
A Fabric environment contains a collection of configurations, including Spark compute properties, that you can attach to notebooks and Spark jobs.
That makes environments a natural CI/CD unit: you can standardize driver/executor sizing, dynamic executor allocation, and Spark properties across many workloads.
Environments are also where Fabric exposes the Native Execution Engine (NEE) toggle under Spark compute → Acceleration.
Microsoft documents that enabling NEE at the environment level causes subsequent jobs and notebooks associated with that environment to inherit the setting.
NEE reads as enabled in source, but ends up disabled in the target
In the report, the Environment’s source-controlled Sparkcompute.yml includes enable_native_execution_engine: true along with driver/executor cores and memory, dynamic executor allocation, Spark properties, and a runtime version.
The user then deploys to a downstream workspace (PPE) using fabric-cicd and expects the deployed Environment to show the Acceleration checkbox enabled.
Instead, the target Environment shows the checkbox unchecked (false), even though the deployment logs indicate that Spark settings were updated.
A key signal in the debug log: PATCH request includes the field, response omits it
The issue includes a fabric-cicd debug snippet showing a PATCH to an environments .../sparkcompute endpoint where the request body contains enableNativeExecutionEngine set to true.
However, the response body shown in the issue includes driver/executor sizing and Spark properties but does not include enableNativeExecutionEngine.
The user further validates the discrepancy by exporting/syncing the PPE workspace back to Git: the resulting Sparkcompute.yml shows enable_native_execution_engine: false.
What to do today: treat NEE as a “verify after deploy” setting
Until the underlying behavior is fixed, assume this flag can drift across environments even when other Spark compute properties deploy correctly.
Practically, that means adding a post-deploy verification step for downstream workspaces—especially if you rely on NEE for predictable performance or cost.
Checklist: a lightweight deployment guardrail
Here’s a low-friction way to catch this class of issue early (even if you don’t have an automated API read-back step yet):
Ensure the source-controlled Sparkcompute.yml includes enable_native_execution_engine: true.
Deploy with verbose/debug logging and confirm the PATCH body contains enableNativeExecutionEngine: true.
After deployment, open the target Environment → Spark compute → Acceleration and verify the checkbox state.
Optionally: export/sync the target workspace back to Git and confirm the exported Sparkcompute.yml matches your intent.
Workarounds (choose your tradeoff)
If you’re blocked, the simplest workaround is operational: enable NEE in the target environment via the UI after deployment and treat it as a manual step until the bug is resolved.
If you need full automation, a more robust approach is to add a post-deploy validation/remediation step that checks the environment setting and re-applies it if it’s not set.
Reporting and tracking
If you’re affected, add reproducibility details (runtime version, library version, auth mode) and any additional debug traces to issue #776 so maintainers can confirm whether the API ignores the field, expects a different contract, or requires a different endpoint/query parameter.
Even if you don’t use fabric-cicd, the pattern is broadly relevant: CI/CD is only reliable when you can round-trip configuration (write, then read-back to verify) for each control surface you’re treating as ‘source of truth.’
Closing thoughts
Native Execution Engine is positioned as a straightforward acceleration you can enable at the environment level to benefit subsequent Spark workloads.
When that toggle doesn’t deploy as expected, the pragmatic response is to verify after deploy, document the drift, and keep your CI/CD pipeline honest by validating the settings you care about—not just the HTTP status code.
Spark tuning has a way of chewing up time: you start with something that “should be fine,” performance is off, costs creep up, and suddenly you’re deep in configs, Spark UI, and tribal knowledge trying to figure out what actually matters.
That’s why I’m excited to highlight sparkwise, an open-source Python package created by Santhosh Kumar Ravindran, one of my direct reports here at Microsoft. Santhosh built sparkwise to make Spark optimization in Microsoft Fabric less like folklore and more like a repeatable workflow: automated diagnostics, session profiling, and actionable recommendations to help teams drive better price-performance without turning every run into an investigation.
If you’ve ever thought, “I know something’s wrong, but I can’t quickly prove what to change,” sparkwise is aimed squarely at that gap. (PyPI)
As of January 5, 2026, the latest release is sparkwise 1.4.2 on PyPI. (PyPI)
The core idea: stop guessing, start diagnosing
Spark tuning often fails for two reasons:
Too many knobs (Spark, Delta, Fabric-specific settings, runtime behavior).
Not enough feedback (it’s hard to translate symptoms into the few changes that actually matter).
sparkwise attacks both.
It positions itself as an “automated Data Engineering specialist for Apache Spark on Microsoft Fabric,” offering:
Intelligent diagnostics
Configuration recommendations
Comprehensive session profiling …so you can get to the best price/performance outcome without turning every notebook run into a science project. (PyPI)
Why sparkwise exists (and the problems it explicitly targets)
From the project description, sparkwise focuses on the stuff that reliably burns time and money in real Fabric Spark work:
Cost optimization: detect configurations that waste capacity and extend runtime (PyPI)
Performance optimization: validate and enable Fabric-specific acceleration paths like Native Engine and resource profiles (PyPI)
Faster iteration: detect Starter Pool blockers that force slower cold starts (3–5 minutes is called out directly) (PyPI)
Learning & clarity: interactive Q&A across 133 Spark/Delta/Fabric configurations (PyPI)
Workload understanding: profiling across sessions, executors, jobs, and resources (PyPI)
Decision support: priority-ranked recommendations with impact analysis (PyPI)
If you’ve ever thought “I know something is off, but I can’t prove which change matters,” this is aimed squarely at you.
What you get: a feature tour that maps to real-world Spark pain
sparkwise’s feature set is broad, but it’s not random. It clusters nicely into a few “jobs to be done.”
1) Automated diagnostics (the fast “what’s wrong?” pass)
The diagnostics layer checks a bunch of high-impact areas, including:
Native Execution Engine: verifies Velox usage and detects fallbacks to row-based processing (PyPI)
Spark compute: analyzes Starter vs Custom Pool usage and flags immutable configs (PyPI)
Data skew detection: identifies imbalanced task distributions (PyPI)
That’s a clean “ops loop” for keeping Delta tables healthy.
A realistic “first hour” workflow I’d recommend
If you’re trying sparkwise on a real Fabric notebook today, here’s a practical order of operations:
Run diagnose.analyze() first Use it as your “triage” to catch the high-impact misconfigs (Native Engine fallback, AQE off, Starter Pool blockers). (PyPI)
Use ask.config() for any red/yellow item you don’t fully understand The point is speed: read the explanation in context and decide. (PyPI)
Profile the session If the job is still slow/expensive after obvious fixes, profile and look for the real culprit: skew, shuffle pressure, poor parallelism, memory pressure. (PyPI)
If the job smells like skew, use advanced skew detection Especially for joins and wide aggregations. (PyPI)
If your tables are growing, run storage analysis early Small files and weak partitioning quietly tax everything downstream. (PyPI)
That flow is how you turn “tuning” from an art project into a checklist.
Closing: why this matters for Fabric teams
I’m amplifying sparkwise because it’s the kind of contribution that scales beyond the person who wrote it. Santhosh took hard-earned, real-world Fabric Spark tuning experience and turned it into something other engineers can use immediately — a practical way to spot waste, unblock faster iteration, and make smarter performance tradeoffs.
If your team runs Fabric Spark workloads regularly, treat sparkwise like a lightweight tuning partner:
install it,
run the diagnostics,
act on one recommendation,
measure the improvement,
repeat.
And if you end up with feedback or feature ideas, even better — that’s how tools like this get sharper and more broadly useful.
There are celebrity deaths that feel distant in a way that’s hard to explain without sounding cold. You see the headline, you register it, you think that’s sad, and then the day keeps moving. The world has trained us to process loss at scroll-speed.
But every once in a while, one lands different. It doesn’t feel like news. It feels like someone quietly turned a key inside you and opened a door you forgot existed.
Maybe that sounds ridiculous to people who didn’t grow up with Buck Rogers in the 25th Century in their bloodstream. Maybe it sounds like nostalgia doing what nostalgia does. But this wasn’t just “an actor I liked.” This was a particular piece of childhood—one of those warm, bright anchors—suddenly becoming something you can only visit, not live alongside.
And it’s December, which makes everything heavier.
The holidays have a way of putting your life on a loop. The same music. The same lights. The same half-remembered rituals you didn’t realize you’d been collecting for decades. This time of year doesn’t just bring memories back; it drags them in by the collar and sets them down in front of you like, Look. Pay attention.
So when I saw the news, it didn’t feel like losing a celebrity.
It felt like losing a doorway.
Buck Rogers wasn’t a show I watched. It was a place I went.
Some shows are entertainment. Some are comfort. And some become the background radiation of your childhood — you don’t even remember the first time you saw them, because they feel like they were always there.
That’s what Buck Rogers was for me.
It was shiny, goofy, sincere, and somehow confident enough to be all three without apologizing. It was the future as imagined by a world that still believed the future could be fun. It had that late-70s/early-80s optimism baked into the sets and the pacing — like even the danger had a little wink in it.
And in the middle of all of that was Gil Gerard.
His Buck wasn’t “perfect hero” energy. He was cocky in a way that felt survivable. He was charming without being smug. He had that specific kind of grin that said: Yeah, this is insane — but we’re gonna be fine. As a kid, that matters more than you realize. A character like that doesn’t just entertain you; he teaches your nervous system what “okay” can feel like.
When you grow up, you start to understand why you clung to that.
Princess Ardala, obviously
Pamela Hensley as Princess Ardala
And yes — Princess Ardala.
I’ve written about my love for her plenty, and I’m not stopping now. Ardala wasn’t just a villain. She was glamour with teeth. She was command presence and mishelpful desire and that intoxicating confidence that makes you root for someone even when you know better.
She was also part of why the show stuck in my brain the way it did. Ardala made Buck Rogers feel like it had adult electricity under the hood — like it understood that charm and danger can share the same room.
But here’s the thing I don’t think I appreciated until now: Ardala worked because Buck worked.
You need the center to make the orbit matter. You need someone steady enough to make the outrageous feel real. Gil Gerard was that steady. He didn’t overplay it. He didn’t flinch from the camp. He just stood there in the middle of it — smirking, sincere, game for the ride — and that’s what made the whole thing click.
So when he goes, it isn’t just “Buck is gone.” It’s like the whole little universe loses its gravity.
Why it hurts more in the holidays
Because December is already full of ghosts.
It’s the month where you catch yourself standing in a familiar room and realizing time has been moving faster than you’ve wanted to admit. It’s the month where you see an ornament and suddenly remember a person’s laugh. It’s the month where a song can knock the wind out of you in a grocery store aisle.
Holiday nostalgia is sneaky. It doesn’t feel like sadness until it does.
And Gil Gerard’s death—right now, right in the middle of the season that already has you looking backward—feels like a confirmation of something you spend most of the year successfully ignoring:
That childhood is not a place you can go back to. It’s a place you carry. And sometimes, someone you associated with that place disappears, and the weight of it finally shows up.
Not because you knew him.
Because you knew you, back then.
And you miss that kid more than you expected.
What I’m doing with it
I’m not trying to turn this into a big philosophical thing. I’m just being honest about the shape of the grief.
It’s not the grief of losing a family member. It’s not the grief of losing a friend. It’s its own strange category: the grief of realizing another thread connecting you to your early life has been cut.
So I’m going to do the only thing that makes sense.
I’m going to watch an episode.
Not in the “content consumption” way. In the ritual way. The way you replay something not because it’s new, but because it reminds you that you’ve been here before — you’ve felt wonder before, you’ve felt comfort before, you’ve felt the world get a little lighter for an hour before.
I’ll let the show be what it always was: a bright, weird little pocket of imagination that helped shape me.
And I’ll feel the sting of knowing that time only moves one direction.
Rest in peace, Gil Gerard.
Thanks for being a part of the version of the world where the future felt fun — and where I did, too.
This post was written with assistance from ChatGPT 5.2
Microsoft Fabric makes it incredibly easy to spin up Spark workloads: notebooks, Lakehouse pipelines, dataflows, SQL + Spark hybrid architectures—the whole buffet.
What’s still hard? Knowing why a given Spark job is slow, expensive, or flaky.
A Lakehouse pipeline starts timing out.
A notebook that used to finish in 5 minutes is now taking 25.
Costs spike because one model training job is shuffling half the lake.
You open the Spark UI, click around a few stages, stare at shuffle graphs, and say the traditional words of Spark debugging:
“Huh.”
This is where an AI assistant should exist.
In this post, we’ll walk through how to build exactly that for Fabric Spark: a Job Doctor that:
Reads Spark telemetry from your Fabric environment
Detects issues like skew, large shuffles, spill, and bad configuration
Uses a large language model (LLM) to explain what went wrong
Produces copy-pasteable fixes in Fabric notebooks / pipelines
Runs inside Fabric using Lakehouses, notebooks, and Azure AI models
This is not a fake product announcement. This is a blueprint you can actually build.
What Is the Fabric “Job Doctor”?
At a high level, the Job Doctor is:
A Fabric-native analytics + AI layer that continuously reads Spark job history, detects common performance anti-patterns, and generates human-readable, prescriptive recommendations.
Or an EventLog record with a payload that looks like the Spark listener event.
To build a Job Doctor, you’ll:
Read the JSON lines into Fabric Spark
Explode / parse the properties payload
Aggregate per-task metrics into per-stage metrics for each application
We’ll skip the exact parsing details (they depend on how you set up the emitter and which events/metrics you enable) and assume that after a normalization job, you have a table with one row per (applicationId, stageId, taskId).
That’s what the next sections use.
3. Capturing Query Plans in Fabric (Optional, but Powerful)
Spark query plans are gold when you’re trying to answer why a stage created a huge shuffle or why a broadcast join didn’t happen.
There isn’t yet a first-class “export query plan as JSON” API in PySpark, but in Fabric notebooks you can use a (semi-internal) trick that works today:
import json
df = ... # some DataFrame you care about
# Advanced / internal: works today but isn't a public, stable API
plan_json = json.loads(df._jdf.queryExecution().toJSON())
You can also log the human-readable plan:
df.explain(mode="formatted") # documented mode, prints a detailed plan
To persist the JSON plan for the Job Doctor, tie it to the Spark application ID:
properties (nested JSON with stage/task/metric detail)
The normalization step (which you can run as a scheduled pipeline) should:
Filter down to metrics/events relevant for performance (e.g. task / stage metrics)
Extract stageId, taskId, executorRunTime, shuffleReadBytes, etc., into top-level columns
Persist the result as job_doctor.task_metrics (or similar)
For the rest of this post, we’ll assume you’ve already done that and have a table with columns:
applicationId
stageId
taskId
executorRunTime
shuffleReadBytes
shuffleWriteBytes
memoryBytesSpilled
diskBytesSpilled
Aggregating Stage Metrics in Fabric
Now we want to collapse per-task metrics into per-stage metrics per application.
In a Fabric notebook:
from pyspark.sql import functions as F
task_metrics = spark.table("job_doctor.task_metrics")
stage_metrics = (
task_metrics
.groupBy("applicationId", "stageId")
.agg(
F.countDistinct("taskId").alias("num_tasks"),
F.sum("executorRunTime").alias("total_task_runtime_ms"),
# Depending on Spark version, you may need percentile_approx instead
F.expr("percentile(executorRunTime, 0.95)").alias("p95_task_runtime_ms"),
F.max("executorRunTime").alias("max_task_runtime_ms"),
F.sum("shuffleReadBytes").alias("shuffle_read_bytes"),
F.sum("shuffleWriteBytes").alias("shuffle_write_bytes"),
F.sum("memoryBytesSpilled").alias("memory_spill_bytes"),
F.sum("diskBytesSpilled").alias("disk_spill_bytes"),
)
.withColumn(
"skew_ratio",
F.col("max_task_runtime_ms") /
F.when(F.col("p95_task_runtime_ms") == 0, 1).otherwise(F.col("p95_task_runtime_ms"))
)
.withColumn("shuffle_read_mb", F.col("shuffle_read_bytes") / (1024**2))
.withColumn("shuffle_write_mb", F.col("shuffle_write_bytes") / (1024**2))
.withColumn(
"spill_mb",
(F.col("memory_spill_bytes") + F.col("disk_spill_bytes")) / (1024**2)
)
)
stage_metrics.write.mode("overwrite").saveAsTable("job_doctor.stage_metrics")
This gives you a Fabric Lakehouse table with:
skew_ratio
shuffle_read_mb
shuffle_write_mb
spill_mb
p95_task_runtime_ms
num_tasks, total_task_runtime_ms, etc.
You can run this notebook:
On a schedule via a Data Pipeline
Or as a Data Engineering job configured in the workspace
Part 3: Adding a Rule Engine Inside Fabric
Now that the metrics are in a Lakehouse table, let’s add a simple rule engine in Python.
This will run in a Fabric notebook (or job) and write out issues per stage.
from pyspark.sql import Row, functions as F
stage_metrics = spark.table("job_doctor.stage_metrics")
# For simplicity, we'll collect to the driver here.
# This is fine if you don't have thousands of stages.
# For very large workloads, you'd instead do this via a UDF / mapInPandas / explode.
stage_rows = stage_metrics.collect()
Define some basic rules:
def detect_issues(stage_row):
issues = []
# 1. Skew detection
if stage_row.skew_ratio and stage_row.skew_ratio > 5:
issues.append({
"issue_id": "SKEWED_STAGE",
"severity": "High",
"details": f"Skew ratio {stage_row.skew_ratio:.1f}"
})
# 2. Large shuffle
total_shuffle_mb = (stage_row.shuffle_read_mb or 0) + (stage_row.shuffle_write_mb or 0)
if total_shuffle_mb > 10_000: # > 10 GB
issues.append({
"issue_id": "LARGE_SHUFFLE",
"severity": "High",
"details": f"Total shuffle {total_shuffle_mb:.1f} MB"
})
# 3. Excessive spill
if (stage_row.spill_mb or 0) > 1_000: # > 1 GB
issues.append({
"issue_id": "EXCESSIVE_SPILL",
"severity": "Medium",
"details": f"Spill {stage_row.spill_mb:.1f} MB"
})
return issues
Apply the rules and persist the output:
issue_rows = []
for r in stage_rows:
for issue in detect_issues(r):
issue_rows.append(Row(
applicationId=r.applicationId,
stageId=r.stageId,
issue_id=issue["issue_id"],
severity=issue["severity"],
details=issue["details"]
))
issues_df = spark.createDataFrame(issue_rows)
issues_df.write.mode("overwrite").saveAsTable("job_doctor.stage_issues")
Now you have a table of Spark issues detected per run inside your Lakehouse.
Later, the LLM will use these as structured hints.
Part 4: Bringing in the LLM — Turning Metrics into Diagnosis
So far, everything has been pure Spark in Fabric.
Now we want a model (e.g., Azure AI “Models as a Service” endpoint or Azure OpenAI) to turn:
job_doctor.stage_metrics
job_doctor.stage_issues
job_doctor.spark_conf
job_doctor.query_plans
into an actual diagnosis sheet a human can act on.
In Fabric, this is simplest from a Spark notebook using a Python HTTP client.
Below, I’ll show the pattern using an Azure AI serverless model endpoint (the one that uses model: "gpt-4.1" in the body).
1. Prepare the Prompt Payload
First, fetch the data for a single Spark application:
import json
from pyspark.sql import functions as F
app_id = "app-20240501123456-0001" # however you pick which run to diagnose
stages_df = spark.table("job_doctor.stage_metrics").where(F.col("applicationId") == app_id)
issues_df = spark.table("job_doctor.stage_issues").where(F.col("applicationId") == app_id)
conf_df = spark.table("job_doctor.spark_conf").where(F.col("applicationId") == app_id)
plans_df = spark.table("job_doctor.query_plans").where(F.col("applicationId") == app_id)
stages_json = stages_df.toPandas().to_dict(orient="records")
issues_json = issues_df.toPandas().to_dict(orient="records")
conf_json = conf_df.toPandas().to_dict(orient="records")
plans_json = plans_df.toPandas().to_dict(orient="records") # likely 0 or 1 row
Then build a compact but informative prompt:
prompt = f"""
You are an expert in optimizing Apache Spark jobs running on Microsoft Fabric.
Here is summarized telemetry for one Spark application (applicationId={app_id}):
Stage metrics (JSON):
{json.dumps(stages_json, indent=2)}
Detected issues (JSON):
{json.dumps(issues_json, indent=2)}
Spark configuration (key/value list):
{json.dumps(conf_json, indent=2)}
Query plans (optional, may be empty):
{json.dumps(plans_json, indent=2)}
Your tasks:
1. Identify the top 3–5 performance issues for this run.
2. For each, explain the root cause in plain language.
3. Provide concrete fixes tailored for Fabric Spark, including:
- spark.conf settings (for notebooks/jobs)
- suggestions for pipeline settings where relevant
- SQL/DataFrame code snippets
4. Estimate likely performance impact (e.g., "30–50% reduction in runtime").
5. Call out any risky or unsafe changes that should be tested carefully.
Return your answer as markdown.
"""
2. Call an Azure AI Model from Fabric Spark
For the serverless “Models as a Service” endpoint, the pattern looks like this:
import os
import requests
# Example: using Azure AI Models as a Service
# AZURE_AI_ENDPOINT might look like: https://models.inference.ai.azure.com
AZURE_AI_ENDPOINT = os.environ["AZURE_AI_ENDPOINT"]
AZURE_AI_KEY = os.environ["AZURE_AI_KEY"]
MODEL = "gpt-4.1" # or whatever model you've enabled
headers = {
"Content-Type": "application/json",
"api-key": AZURE_AI_KEY,
}
body = {
"model": MODEL,
"messages": [
{"role": "system", "content": "You are a helpful assistant for optimizing Spark jobs on Microsoft Fabric."},
{"role": "user", "content": prompt},
],
}
resp = requests.post(
f"{AZURE_AI_ENDPOINT}/openai/chat/completions",
headers=headers,
json=body,
)
resp.raise_for_status()
diagnosis = resp.json()["choices"][0]["message"]["content"]
If you instead use a provisioned Azure OpenAI resource, the URL shape is slightly different (you call /openai/deployments/<deploymentName>/chat/completions and omit the model field), but the rest of the logic is identical.
At this point, diagnosis is markdown you can:
Render inline in the notebook with displayHTML
Save into a Lakehouse table
Feed into a Fabric semantic model for reporting
Part 5: What the Job Doctor’s Output Looks Like in Fabric
A good Job Doctor output for Fabric Spark might look like this (simplified):
🔎 Issue 1: Skewed Stage 4 (skew ratio 12.3)
What I see
Stage 4 has a skew ratio of 12.3 (max task runtime vs. p95).
This stage also reads ~18.2 GB via shuffle, which amplifies the imbalance.
Likely root cause
A join or aggregation keyed on a column where a few values dominate (e.g. a “default” ID, nulls, or a small set of hot keys). One partition ends up doing far more work than the others.
Fabric-specific fixes
In your notebook or job settings, enable Adaptive Query Execution and skew join handling:
If the query is in SQL (Lakehouse SQL endpoint), enable AQE at the session/job level through Spark configuration.
If one side of the join is a small dimension table, add a broadcast hint:
SELECT /*+ BROADCAST(dim) */ f.*
FROM fact f
JOIN dim
ON f.key = dim.key;
Estimated impact: 30–50% reduction in total job runtime, depending on how skewed the key distribution is.
📦 Issue 2: Large Shuffle in Stage 2 (~19.7 GB)
What I see
Stage 2 reads ~19.7 GB via shuffle.
Shuffle partitions are set to 200 (Spark default).
Likely root cause
A join or aggregation is shuffling nearly the full dataset, but parallelism is low given the data volume. That leads to heavy tasks and increased risk of spill.
For pipelines, set this at the Spark activity level under Spark configuration, or through your Fabric environment’s resource profile if you want a new default.
Also consider partitioning by the join key earlier in the pipeline:
df = df.repartition("customer_id")
Estimated impact: More stable runtimes and reduced likelihood of spill; wall-clock improvements if your underlying capacity has enough cores.
💾 Issue 3: Spill to Disk (~1.8 GB) in Stage 3
What I see
Stage 3 spills ~1.8 GB to disk.
This correlates with under-parallelism or memory pressure.
Fabric-specific fixes
Adjust cluster sizing via Fabric capacity / resource profiles (enough cores + memory per core).
Increase spark.sql.shuffle.partitions as above.
Avoid wide transformations producing huge intermediate rows early in the job; materialize smaller, more selective intermediates first.
You can persist the diagnosis text into a table:
from pyspark.sql import Row
spark.createDataFrame(
[Row(applicationId=app_id, diagnosis_markdown=diagnosis)]
).write.mode("append").saveAsTable("job_doctor.diagnoses")
Then you can build a Power BI report in Fabric bound to:
job_doctor.diagnoses
job_doctor.stage_metrics
job_doctor.stage_issues
to create a “Spark Job Health” dashboard where:
Rows = recent Spark runs
Columns = severity, duration, shuffle size, spill, etc.
A click opens the AI-generated diagnosis for that run
All inside the same workspace.
Part 6: Stitching It All Together in Fabric
Let’s recap the full Fabric-native architecture.
1. Telemetry Ingestion (Environment / Emitter)
Configure a Fabric environment for your Spark workloads.
Add a Fabric Apache Spark diagnostic emitter to send logs/metrics to:
Azure Storage (for Lakehouse shortcuts), or
Log Analytics / Event Hubs if you prefer KQL or streaming paths.
(Optional) From notebooks/pipelines, capture:
Spark configs → job_doctor.spark_conf
Query plans → job_doctor.query_plans
2. Normalization Job (Spark / Data Pipeline)
Read raw diagnostics from Storage via a Lakehouse shortcut.
Parse and flatten the records into per-task metrics.
For each new (or most expensive / slowest) application:
Pull stage metrics, issues, configs, and query plans from Lakehouse.
Construct a structured prompt.
Call your Azure AI / Azure OpenAI endpoint from a Fabric Spark notebook.
Store the markdown diagnosis in job_doctor.diagnoses.
4. User Experience
Fabric Notebook
A “Run Job Doctor” cell or button that takes applicationId, calls the model, and displays the markdown inline.
Data Pipeline / Job
Scheduled daily to scan all runs from yesterday and generate diagnoses automatically.
Power BI Report in Fabric
“Spark Job Health” dashboard showing:
Top slowest/most expensive jobs
Detected issues (skew, large shuffle, spill, config problems)
AI recommendations, side-by-side with raw metrics
Everything lives in one Fabric workspace, using:
Lakehouses for data
Spark notebooks / pipelines for processing
Azure AI models for reasoning
Power BI for visualization
Why a Fabric-Specific Job Doctor Is Worth Building
Spark is Spark, but in Fabric the story is different:
Spark jobs are tied closely to Lakehouses, Pipelines, Dataflows, and Power BI.
You already have a single control plane for capacity, governance, cost, and monitoring.
Logs, metrics, and reports can live right next to the workloads they describe.
That makes Fabric an ideal home for a Job Doctor:
No extra infrastructure to stand up
No random side services to glue together
The telemetry you need is already flowing; you just have to catch and shape it
AI can sit directly on top of your Lakehouse + monitoring data
With some Spark, a few Lakehouse tables, and an LLM, you can give every data engineer and analyst in your organization a “Spark performance expert” that’s always on call.
I’ve included a sample notebook you can use to get started on your Job Doctor today!
This post was created with help from (and suggested to me) by ChatGPT Pro using the 5.1 Thinking Model
We’re told to “follow your passion” like it’s a career cheat code.
Love what you do and you’ll never work a day in your life.
Find your calling.
Do what you’d do for free.
It sounds inspiring. And sometimes, it is true: passion can make work feel meaningful, energizing, and deeply satisfying.
But there’s a shadow side that doesn’t get talked about enough.
Passion at work is a double-edged sword. Held correctly, it can cut through apathy, fear, and mediocrity. Held wrong, it cuts you—your health, your relationships, your boundaries, and even your performance.
This isn’t a call to care less. It’s a call to care wiser.
The Bright Edge: Why Passion Is Powerful
Let’s start with the good news: passion is not the enemy.
1. Passion keeps you going when things are hard
When you actually care about what you’re building, you can push through the boring parts: the documentation, the messy legacy systems, the political nonsense. Passion creates stamina. It’s why some people can do deep work for hours and others are clock-watching at 2:17 p.m.
2. Passion improves the quality of your work
When you’re invested, you notice details other people miss. You think more about edge cases, customer impact, long-term consequences. Passion often shows up as craftsmanship: “this isn’t just done, it’s done right.”
3. Passion makes you more resilient to setbacks
Passionate people bounce back faster from failure. A bad launch, a tough review, a missed promotion hurts—but if you care about the mission, it’s easier to treat it as a data point instead of a verdict on your worth.
4. Passion is contagious
When someone genuinely cares, people feel it. It can pull a team forward. Customers trust you more. Leaders notice your ownership. Passion, when grounded, is a quiet magnet.
All of that is real.
And yet.
The Dark Edge: When Passion Starts Cutting You
Passion becomes dangerous when it slips from “I care a lot” into “I am my work.”
Here’s how that shows up.
1. Your identity fuses with your job
If you’re passionate, it’s easy to start thinking:
“If this project fails, I am a failure.” “If my manager is unhappy, I am not good enough.” “If this company doesn’t appreciate me, maybe I’m not valuable.”
Passion can blur the line between what you do and who you are. Then criticism isn’t feedback on work; it’s an attack on your identity. That’s emotionally exhausting and makes you defensive instead of curious.
2. You become easy to exploit
Harsh truth: workplaces love passionate people—sometimes for the wrong reasons.
If you’re the “I’ll do whatever it takes” person:
You get the late-night emergencies. You pick up slack from weaker teammates. You “volunteer” for stretch work no one else wants. You feel guilty saying no because “this matters.”
The line between commitment and self-betrayal gets blurry. Passion, unmanaged, can turn you into free overtime wrapped in a nice attitude.
3. Burnout hides in plain sight
Passion can mask burnout for a long time because you like the work. You tell yourself:
“I’m just busy right now.” “It’ll calm down after this release / quarter / crisis.” “I don’t need a break; I just need to be more efficient.”
Meanwhile, the signals are there:
You’re always tired, even after weekends. Small setbacks feel like huge emotional blows. You resent people who seem more “chill.” You’re working more but enjoying it less.
By the time you admit you’re burned out, you’re far past the “fix it with a vacation” stage.
4. Passion narrows your vision
When you really care about a project or idea, you can get tunnel vision:
You dismiss risks because “we’ll figure it out.” You take feedback as an attack, not input. You see other teams as blockers, not partners. You overestimate how much others care about your problem.
Passion can make you worse at strategy if it stops you from seeing tradeoffs clearly. Being too attached to a specific solution can blind you to better ones.
5. Emotional volatility becomes the norm
The more passionate you are, the bigger the emotional swings:
Feature shipped? You’re high for a week. Leadership cancels it? You’re crushed for a month. Good performance review? You’re invincible. Reorg? You’re spiraling.
Your nervous system never stabilizes. Work becomes a rollercoaster controlled by people who don’t live inside your head.
The Subtle Trap: Passion as Justification
One of the most dangerous patterns is this:
“I’m exhausted, anxious, and on edge—but that’s the price of caring.”
No. That’s not the price of caring. That’s the price of caring without boundaries.
Passion is not supposed to destroy your sleep, wreck your relationships, or make you hate yourself when something slips. That’s not noble. That’s mismanagement.
You wouldn’t let a junior teammate run production unmonitored with no guardrails. But most passionate people let their emotions do exactly that.
Holding the Sword by the Handle: Healthier Ways to Be Passionate
So what does healthy passion at work look like?
It’s not about caring less. It’s about caring in a way that doesn’t consume you.
Here are some practical shifts.
1. Separate “me” from “my output”
Mentally, you want this frame:
“This work matters to me.” “I’m proud of the effort, decisions, and integrity I bring.” “The outcome is influenced by many factors, some outside my control.”
You can care deeply about quality and impact while still treating outcomes as feedback, not final judgment.
A useful self-check:
“If this project got canceled tomorrow, would I still believe I’m capable and valuable?”
If the honest answer is no, your identity is too fused to the work.
2. Define your own success metrics
When you’re passionate, it’s easy to adopt everyone else’s scoreboard: exec praise, promotion velocity, launch glamour.
Build a second scoreboard that’s yours:
Did I learn something hard this month? Did I push for a decision that needed to be made? Did I support my team in a way I’m proud of? Did I hold a boundary that protected my health?
Those are wins too. They just don’t show up on the OKR dashboard.
3. Make a “portfolio of meaning”
If work is your only source of meaning, every wobble at work feels like an earthquake.
Create a portfolio:
Relationships (family, partners, close friends) Health (sleep, movement, mental hygiene) Personal interests (hobbies, side projects, learning) Contribution outside work (mentoring, community, parenting, etc.)
Passion at work is safest when it’s one important part of your life, not the entire scaffolding holding your self-worth up.
4. Put boundaries on the calendar, not in your head
“I should have better boundaries” is useless if your calendar is a disaster.
Concrete examples:
Block “no meeting” focus time and defend it. Choose 1–2 late nights a week max and keep the rest sacred. Decide in advance when you’ll check email/Slack after hours (if at all). Put workouts, therapy, or walks in your calendar as real appointments.
If it doesn’t exist in time and space, it’s just a wish.
5. Watch your internal narrative
Passion often comes with spicy self-talk:
“If I don’t fix this, everything will fall apart.” “They have no idea how much I’m carrying.” “I can’t slow down; people are counting on me.”
Sometimes that’s true. A lot of times, it’s your brain cosplaying as the lone hero.
Try swapping narratives:
From “I’m the only one who cares” → to “I care a lot, and it’s my job to bring others along, not martyr myself.” From “If I don’t say yes, I’m letting the team down” → to “If I say yes to everything, I’m guaranteeing lower quality for everyone.”
6. Be transparent with your manager (to a point)
You don’t need to pour your entire soul out, but you can say:
“I care a lot about this space and tend to over-extend. I want to stay sustainable. Can we align on where you most want me to go above and beyond, and where ‘good enough’ is genuinely good enough?” “Here’s what I’m currently carrying. If we add X, what do you want me to drop or downgrade?”
Good managers want passionate people to last. If your manager doesn’t… that’s useful information about whether this is the right place to invest your energy.
7. Build a small “reality check” circle
Have 1–3 people who know you well and can tell when your passion is tipping into self-harm. Give them permission to say:
“You’re over-owning this. This isn’t all on you.” “You’re talking like the job is your entire worth.” “You haven’t talked about anything but work in weeks. What’s going on?”
Passion distorts perspective from the inside. You need outside eyes.