Azure – Christopher Finlan

Sparkwise: an “automated data engineering specialist” for Fabric Spark tuning

Spark tuning has a way of chewing up time: you start with something that “should be fine,” performance is off, costs creep up, and suddenly you’re deep in configs, Spark UI, and tribal knowledge trying to figure out what actually matters.

That’s why I’m excited to highlight sparkwise, an open-source Python package created by Santhosh Kumar Ravindran, one of my direct reports here at Microsoft. Santhosh built sparkwise to make Spark optimization in Microsoft Fabric less like folklore and more like a repeatable workflow: automated diagnostics, session profiling, and actionable recommendations to help teams drive better price-performance without turning every run into an investigation.

If you’ve ever thought, “I know something’s wrong, but I can’t quickly prove what to change,” sparkwise is aimed squarely at that gap. (PyPI)

As of January 5, 2026, the latest release is sparkwise 1.4.2 on PyPI. (PyPI)

The core idea: stop guessing, start diagnosing

Spark tuning often fails for two reasons:

Too many knobs (Spark, Delta, Fabric-specific settings, runtime behavior).
Not enough feedback (it’s hard to translate symptoms into the few changes that actually matter).

sparkwise attacks both.

It positions itself as an “automated Data Engineering specialist for Apache Spark on Microsoft Fabric,” offering:

Intelligent diagnostics
Configuration recommendations
Comprehensive session profiling
…so you can get to the best price/performance outcome without turning every notebook run into a science project. (PyPI)

Why sparkwise exists (and the problems it explicitly targets)

From the project description, sparkwise focuses on the stuff that reliably burns time and money in real Fabric Spark work:

Cost optimization: detect configurations that waste capacity and extend runtime (PyPI)
Performance optimization: validate and enable Fabric-specific acceleration paths like Native Engine and resource profiles (PyPI)
Faster iteration: detect Starter Pool blockers that force slower cold starts (3–5 minutes is called out directly) (PyPI)
Learning & clarity: interactive Q&A across 133 Spark/Delta/Fabric configurations (PyPI)
Workload understanding: profiling across sessions, executors, jobs, and resources (PyPI)
Decision support: priority-ranked recommendations with impact analysis (PyPI)

If you’ve ever thought “I know something is off, but I can’t prove which change matters,” this is aimed squarely at you.

What you get: a feature tour that maps to real-world Spark pain

sparkwise’s feature set is broad, but it’s not random. It clusters nicely into a few “jobs to be done.”

1) Automated diagnostics (the fast “what’s wrong?” pass)

The diagnostics layer checks a bunch of high-impact areas, including:

Native Execution Engine: verifies Velox usage and detects fallbacks to row-based processing (PyPI)
Spark compute: analyzes Starter vs Custom Pool usage and flags immutable configs (PyPI)
Data skew detection: identifies imbalanced task distributions (PyPI)
Delta optimizations: checks V-Order, deletion vectors, optimize write, auto compaction (PyPI)
Runtime tuning: validates AQE, partition sizing, scheduler mode (PyPI)

This is the stuff that tends to produce outsized wins when it’s wrong.

2) Comprehensive profiling (the “what actually happened?” pass)

Once you’re past basic correctness, the next level is: where did time and resources go?

sparkwise includes profiling across:

session metadata and resource allocation
executor status and memory utilization
job/stage/task metrics and bottleneck detection
resource efficiency scoring and utilization analysis (PyPI)

3) Advanced performance analysis (built on real metrics)

One of the most interesting “newer” directions in sparkwise is leaning into actual observed execution metrics:

“Real metrics collection” using Spark stage/task data (vs estimates) (PyPI)
scalability prediction comparing Starter vs Custom Pool with vCore-hour calculations (PyPI)
stage timeline visualization (parallel vs sequential patterns) (PyPI)
efficiency analysis that quantifies wasted compute in vCore-hours (PyPI)

That’s the bridge between “it feels slow” and “here’s the measurable waste + the fix.”

4) Advanced skew detection (because skew kills Spark)

Skew is one of those problems that can hide behind averages and ruin everything anyway.

sparkwise’s skew tooling includes:

straggler detection via task duration variance (PyPI)
partition-level analysis with statistical metrics (PyPI)
skewed join detection with mitigation suggestions (broadcast vs salting strategies) (PyPI)
automatic mitigation guidance with code examples (salting, AQE, broadcast) (PyPI)

5) SQL query plan analysis (spotting anti-patterns early)

For teams living in Spark SQL / DataFrames, this is huge:

anti-pattern detection (cartesian products, full scans, excessive shuffles) (PyPI)
Native Engine compatibility checks (PyPI)
Z-Order recommendations based on cardinality (PyPI)
caching opportunity detection for repeated scans/subqueries (PyPI)

6) Storage optimization suite (new in v1.4.0+)

This is one of the clearest “practical ops” expansions:

small file detection for Delta tables (default threshold is configurable; example shows <10MB) (PyPI)
VACUUM ROI calculator using OneLake pricing assumptions in the project docs (PyPI)
partition effectiveness analysis and over/under-partitioning detection (PyPI)
“run all storage checks in one command” workflows (PyPI)

In other words: not just “your table is messy,” but “here’s why it costs you, and what to do.”

7) Interactive configuration assistant (the “what does this do?” superpower)

This is deceptively valuable. sparkwise provides:

Q&A for 133 documented configurations spanning Spark, Delta, Fabric-specific settings (and Runtime 1.2 configs are called out) (PyPI)
context-aware guidance with workload-specific recommendations (PyPI)
explicit support for Fabric resource profiles (writeHeavy, readHeavyForSpark, readHeavyForPBI) (PyPI)
keyword search across config knowledge (PyPI)

This is the difference between “go read 9 docs” and “ask one question and move on.”

Quick start: the 3 fastest ways to get value

Install

pip install sparkwise

(PyPI)

1) Run a full diagnostic on your current session

from sparkwise import diagnose

diagnose.analyze()

(PyPI)

2) Ask about a specific Spark/Fabric config

from sparkwise import ask

ask.config("spark.native.enabled")
ask.search("optimize")

(PyPI)

3) Profile your run (and pinpoint bottlenecks)

from sparkwise import (
    profile, profile_executors, profile_jobs, profile_resources,
    predict_scalability, show_timeline, analyze_efficiency
)

profile()
profile_executors()
profile_jobs()
profile_resources()

predict_scalability()
show_timeline()
analyze_efficiency()

(PyPI)

CLI workflows (especially useful for storage optimization)

If you prefer CLIs (or want repeatable checks in scripts), sparkwise includes commands like:

sparkwise storage analyze Tables/mytable
sparkwise storage small-files Tables/mytable --threshold 10
sparkwise storage vacuum-roi Tables/mytable --retention-hours 168
sparkwise storage partitions Tables/mytable

(PyPI)

That’s a clean “ops loop” for keeping Delta tables healthy.

A realistic “first hour” workflow I’d recommend

If you’re trying sparkwise on a real Fabric notebook today, here’s a practical order of operations:

Run diagnose.analyze() first
Use it as your “triage” to catch the high-impact misconfigs (Native Engine fallback, AQE off, Starter Pool blockers). (PyPI)
Use ask.config() for any red/yellow item you don’t fully understand
The point is speed: read the explanation in context and decide. (PyPI)
Profile the session
If the job is still slow/expensive after obvious fixes, profile and look for the real culprit: skew, shuffle pressure, poor parallelism, memory pressure. (PyPI)
If the job smells like skew, use advanced skew detection
Especially for joins and wide aggregations. (PyPI)
If your tables are growing, run storage analysis early
Small files and weak partitioning quietly tax everything downstream. (PyPI)

That flow is how you turn “tuning” from an art project into a checklist.

Closing: why this matters for Fabric teams

I’m amplifying sparkwise because it’s the kind of contribution that scales beyond the person who wrote it. Santhosh took hard-earned, real-world Fabric Spark tuning experience and turned it into something other engineers can use immediately — a practical way to spot waste, unblock faster iteration, and make smarter performance tradeoffs.

If your team runs Fabric Spark workloads regularly, treat sparkwise like a lightweight tuning partner:

install it,
run the diagnostics,
act on one recommendation,
measure the improvement,
repeat.

And if you end up with feedback or feature ideas, even better — that’s how tools like this get sharper and more broadly useful.

This post was written with help from ChatGPT 5.2

Microsoft Fabric Capacity Management: A Comprehensive Guide for Administrators (using ChatGPT’s Deep Research)

Author’s note – I have enjoyed playing around with the Deep Research capabilities of ChatGPT, and I had it put together what it felt was the definitive whitepaper on Capacity Management for Microsoft Fabric. It basically just used the Microsoft documentation (plus a couple of community posts) to pull it together, so I’m curious what you think. I’ll leave a link to download the PDF copy of this at the end of the post.

Executive Summary

Microsoft Fabric capacities provide the foundational compute resources that power the Fabric analytics platform. They are essentially dedicated pools of compute (measured in Capacity Units or CUs) allocated to an organization’s Microsoft Fabric tenant. Proper capacity management is crucial for ensuring reliable performance, supporting all Fabric workloads (Power BI, Data Engineering, Data Science, Real-Time Analytics, etc.), and optimizing costs. This white paper introduces capacity and tenant administrators to the full spectrum of Fabric capacity management – from basic concepts to advanced strategies.

Key takeaways: Fabric offers multiple capacity SKUs (F, P, A, EM, Trial) with differing capabilities and licensing models. Understanding these SKU types and how to provision them is the first step. Once a capacity is in place, administrators must plan and size it appropriately to meet workload demands without over-provisioning. All Fabric experiences share capacity resources, so effective workload management and governance are needed to prevent any one workload from overwhelming others. Fabric’s capacity model introduces bursting and smoothing to handle short-term peaks, while throttling mechanisms protect the system during sustained overloads. Tools like the Fabric Capacity Metrics App provide visibility into utilization and help with monitoring performance and identifying bottlenecks. Administrators should leverage features such as autoscale options (manual or scripted scaling and Spark auto-scaling), notifications, and the new surge protection to manage peak loads and maintain service levels.

Effective capacity management also involves governance practices: assigning workspaces to capacities in a thoughtful way, isolating critical workloads, and controlling who can create or consume capacity resources. Cost optimization is a continuous concern – this paper discusses strategies like pausing capacities during idle periods, choosing the right SKU size (and switching to reserved pricing for savings), and using per-user licensing (Premium Per User) when appropriate to minimize costs. Finally, we present real-world scenarios with recommendations to illustrate how organizations can mix and match these approaches. By following the guidance in this document, new administrators will be equipped to manage Microsoft Fabric capacities confidently and get the most value from their analytics investment.

Introduction to Microsoft Fabric Capacities

Microsoft Fabric is a unified analytics platform that spans data integration, data engineering, data warehousing, data science, real-time analytics, and business intelligence (Power BI). A Microsoft Fabric capacity is a dedicated set of cloud resources (compute memory/CPU) allocated to a tenant to run these analytics workloads. In essence, a capacity represents a chunk of “always-on” compute power measured in Capacity Units (CUs) that your organization owns or subscribes to. The capacity’s size (number of CUs) determines how much computational load it can handle at any given time.

Why capacities matter: Certain Fabric features and collaborative capabilities are only available when content is hosted in a capacity. For example, to share Power BI reports broadly without requiring per-user licenses, or to use advanced Fabric services like Spark notebooks, data warehouses, and real-time analytics, you must use a Fabric capacity. Capacities enable organization-wide sharing, collaboration, and performance guarantees beyond the limits of individual workstations or ad-hoc cloud resources. They act as containers for workspaces – any workspace assigned to a capacity will run all its workload (reports, datasets, pipelines, notebooks, etc.) on that capacity’s resources. This provides predictable performance and isolation: one team’s heavy data science experiment in their capacity won’t consume resources needed by another team’s dashboards on a different capacity. It also simplifies administration – instead of managing separate compute for each project, admins manage pools of capacity that can host many projects.

In summary, Fabric capacities are the backbone of a Fabric deployment, combining compute isolation, performance scaling, and licensing benefits. With a capacity, your organization can create and share Fabric content (from Power BI reports to AI models) with the assurance of dedicated resources and without every user needing a premium license. The rest of this document will explore how to choose the right capacity, configure it for various workloads, keep it running optimally, and do so cost-effectively.

Capacity SKU Types and Differences (F, P, A, EM, Trial)

Microsoft Fabric builds on the legacy of Power BI’s capacity-based licensing, introducing new Fabric (F) SKUs alongside existing Premium (P) and Embedded SKUs. It’s important for admins to understand the types of capacity SKUs available and their differences:

F-SKUs (Fabric SKUs): These are the new* capacity units introduced with Microsoft Fabric. They are purchased through Azure and measured in Capacity Units (CUs). F-SKUs range from small to very large (F2 up to F2048), each providing a set number of CUs (e.g. F2 = 2 CUs, F64 = 64 CUs, etc.). F-SKUs support all Fabric workloads (Power BI content and the new Fabric experiences like Lakehouse, Warehouse, Spark, etc.). They offer flexible cloud purchasing (hourly pay-as-you-go billing with the ability to pause when not in use) and scaling options. Microsoft is encouraging customers to adopt F-SKUs for Fabric due to their flexibility in scaling and billing.
P-SKUs (Power BI Premium per Capacity): These were the traditional Power BI Premium capacities (P1 through P5) bought via the Microsoft 365 admin center with an annual subscription commitment. P-SKUs also support the full Fabric feature set (they have been migrated onto the Fabric backend). However, as of mid-2024, Microsoft has deprecated new purchases of P-SKUs in favor of F-SKUs. Organizations with existing P capacities can use Fabric on them, but new capacity purchases should be F-SKUs going forward. One distinction is that P-SKUs cannot be paused and were billed as fixed annual licenses (less flexible, but previously lower cost for constant use).
A-SKUs (Azure Power BI Embedded): These are Azure-purchased capacities originally meant for Power BI embedded analytics scenarios. They correspond to the same resource levels as some F-SKUs (for example, A4 is equivalent to an F64 in compute power) but only support Power BI workloads – they do not support the new Fabric experiences like Spark or data engineering. A-SKUs can still be used if you only need Power BI (for example, for embedding reports in a web app), but if any Fabric features are needed, you must use an F or P SKU.
EM-SKUs (Power BI Embedded for organization): Another variant of embedded capacity (EM1, EM2, EM3) which are lower-tier and were used for internal “Embedded” scenarios (like embedding Power BI content in SharePoint or Teams without full Premium). Like A-SKUs, EM SKUs are limited to Power BI content only and correspond to smaller capacity sizes (EM3 ~ F32). They cannot run Fabric workloads.
Trial SKU: Microsoft Fabric offers a free trial capacity to let organizations try Fabric for a limited time. The trial capacity provides 64 CUs (equivalent to an F64 SKU) and supports all Fabric features, but lasts for 60 days. This is a fixed-size capacity (roughly equal to a P1 in power) that can be activated without cost. It’s ideal for initial evaluations and proof-of-concept work. After 60 days, the trial expires (though Microsoft has allowed extensions in some cases). Administrators cannot change the size of a trial capacity – it’s pre-set – and there may be limits on the number of trials per tenant.

The table below summarizes the Fabric SKU sizes and their approximate equivalence to Power BI Premium for context:

SKU	Capacity Units (CUs)	Equivalent P-SKU / A-SKU	Power BI v-cores
F2	2 CUs	(no P-SKU; smallest)	0.25 v-core
F4	4 CUs	(no P-SKU)	0.5 v-core
F8	8 CUs	EM1 / A1	1 v-core
F16	16 CUs	EM2 / A2	2 v-cores
F32	32 CUs	EM3 / A3	4 v-cores
F64	64 CUs	P1 / A4	8 v-cores
Trial	64 CUs	(no P-SKU; free trial)	8 v-cores
F128	128 CUs	P2 / A5	16 v-cores
F256	256 CUs	P3 / A6	32 v-cores
F512	512 CUs	P4 / A7	64 v-cores
F1024	1024 CUs	P5 / A8	128 v-cores
F2048	2048 CUs	(no direct P-SKU)	256 v-cores

Table: Fabric capacity SKU sizes in Capacity Units (CU) with equivalent legacy SKUs. Note: P-SKUs P1–P5 correspond to F64–F1024. A-SKUs and EM-SKUs only support Power BI content and roughly map to F8–F32 sizes.

In practical terms, F64 (64 CU) is the threshold where a capacity is considered “Premium” in the Power BI sense – it has the same 8 v-cores as a P1. Indeed, content in workspaces on an F64 or larger can be consumed by viewers with a free Fabric license (no Pro license needed). By contrast, the smaller F2–F32 capacities, while useful for light workloads or development, do not remove the need for Power BI Pro licenses for content consumers. Administrators should be aware of this distinction: if your goal is to enable broad internal report sharing to free users, you will need at least an F64 capacity.

To recap SKU differences: F-SKUs are the modern, Azure-based Fabric capacities that cover all workloads and offer flexibility (pause/resume, hourly billing). P-SKUs (legacy Premium) also cover all workloads but are being phased out for new purchases, and they require an annual subscription (though existing ones can continue to be used for Fabric). A/EM SKUs are limited to Power BI content only and primarily used for embedding scenarios; they might still be relevant if your organization only cares about Power BI and wants a smaller or cost-specific option. And the trial capacity is a temporary F64 equivalent provided free for evaluation purposes.

Licensing and Provisioning

Before you can use a Fabric capacity, you must license and provision it for your tenant. This involves understanding how to acquire the capacity (through Azure or Microsoft 365), what user licenses are needed, and how to set up the capacity in the admin portal.

Purchasing a capacity: For F-SKUs and A/EM SKUs, capacities are purchased via an Azure subscription. You (or your Azure admin) will create a Microsoft Fabric capacity resource in Azure, selecting the SKU size (e.g. F64) and region. The capacity resource is billed to your Azure account. For P-SKUs (if you already have one), they were purchased through the Microsoft 365 admin center (as a SaaS license commitment). As noted, new P-SKU purchases are no longer available after July 2024. If you have existing P capacities, they will show up in the Fabric admin portal automatically. Otherwise, new capacity needs will be fulfilled by creating F-SKUs in Azure.

Provisioning and setup: Once purchased, the capacity must be provisioned in your Fabric tenant. For Azure-based capacities (F, A, EM), this happens automatically when you create the resource – you will see the new capacity listed in the Fabric Admin Portal under Capacity settings. You need to be a Fabric admin or capacity admin to access this. In the Fabric Admin Portal (accessible via the gear icon in the Fabric UI), under Capacity Settings, you will find tabs for Power BI Premium, Power BI Embedded, Fabric capacity, and Trial. Your capacity will appear in the appropriate section (e.g., an F-SKU under “Fabric capacity”). From there, you can manage its settings (more on that later) and assign workspaces to it.

When creating an F capacity in Azure, you will choose a region (datacenter location) for the capacity. This determines where the compute resources live and typically where the data for Fabric items in that capacity is stored. For example, if you create an F64 in West Europe, a Fabric Warehouse or Lakehouse created in a workspace on that capacity will reside in West Europe region (useful for data residency requirements). Organizations with global presence might provision capacities in multiple regions to keep data and computation local to users or comply with regulations.

Per-user licensing requirements: Even with capacities, Microsoft Fabric uses a mix of capacity licensing and per-user licenses:

Every user who authors content or needs access to Power BI features beyond viewing must have a Power BI Pro license (or Premium Per User) unless the content is in a capacity that allows free-user access. In Fabric, a Free user license lets you create and use non-Power BI Fabric items (like Lakehouses, notebooks, etc.) in a capacity workspace, but it does not allow creating standard Power BI content in shared workspaces or sharing those with others. To publish Power BI reports to a workspace (other than your personal My Workspace) and share them, you still need a Pro license or PPU. Essentially, capacity removes license requirements for viewing content (if the capacity is sufficiently large), but content creators typically need Pro/PPU licenses for Power BI work.
For viewers of content: If the workspace is on a capacity smaller than F64, all viewers need Pro licenses as if it were a normal shared workspace. If the workspace is on an F64 or larger capacity (or a P-SKU capacity), then free licensed users can view the content (they just need the basic Fabric free license and viewer role). This is analogous to Power BI Premium capacity behavior. So an admin must plan license needs accordingly – for true wide audience distribution, ensure the capacity is at least F64, otherwise you won’t realize the “free user view” benefit.
Premium Per User (PPU): PPU is a per-user licensing option that provides most Premium features to individual users on shared capacity. While not a capacity, it’s relevant in capacity planning: if you have a small number of users that need premium features, PPU can be more cost-effective than buying a whole capacity. Microsoft suggests considering PPU if fewer than ~250 users need Premium capabilities. For example, rather than an F64 which supports unlimited users, 50 users could each get PPU licenses. However, PPU does not support the broader Fabric workloads (it’s mainly a Power BI feature set license), so if you want the Fabric engineering/science features, you need a capacity.

In summary, to get started you will purchase or activate a capacity and ensure you have at least one user with a Pro (or PPU) license to administer it and publish Power BI content. Many organizations begin with the Fabric trial capacity – any user with admin rights can initiate the trial from the Fabric portal, which creates the 60-day F64 capacity for the tenant. During the trial period, you might allow multiple users to experiment on that capacity. Once ready to move to production, you would purchase an F-SKU of appropriate size. Keep in mind that a trial capacity is time-bound and also fixed in size (you cannot scale a trial up or down). So after gauging usage in trial, you’ll choose a permanent SKU.

Capacity Planning and Sizing Guidance

Choosing the right capacity size is a critical early decision. Capacity planning is the process of estimating how many CUs (or what SKU tier) you need to run your workloads smoothly, both now and in the future. The goal is to avoid performance problems like slow queries or job failures due to insufficient resources, while also not over-paying for idle capacity. This section provides guidance on sizing a capacity and adjusting it as usage evolves.

Understand your workloads and users: Start by profiling the types of workloads and usage patterns you expect on the capacity. Key factors include:

Data volume and complexity: Large data models (e.g. huge Power BI datasets) or heavy ETL processes (like frequent dataflows or Spark jobs) will consume more compute and memory. If you plan to refresh terabyte-scale datasets or run complex transformations daily, size up accordingly.
Concurrent users and activities: Power BI workloads with many simultaneous report users or queries (or heavy embedded analytics usage) can drive up CPU and memory usage quickly. A capacity serving 200 concurrent dashboard users needs more CUs than one serving 20 users. Concurrency in Spark jobs or SQL queries similarly affects load.
Real-time or continuous processing: If you have real-time analytics (such as continuous event ingestion, KQL databases for IoT telemetry, or streaming datasets), your capacity will see constant usage rather than brief spikes. Ongoing processes mean you need enough capacity to sustain a baseline of usage 24/7.
Advanced analytics and data science: Machine learning model training or large-scale data science experiments can be very computationally intensive (high CPU for extended periods). A few data scientists running complex notebooks might consume more CUs than dozens of basic report users. Also consider if they will run jobs concurrently.
Number of users/roles: The more users with access, the greater the chance of overlapping activities. A company with 200 Power BI users running reports will likely require more capacity than one with 10 engineers doing data transformations. Even if each individual task isn’t huge, many small tasks add up.

By evaluating these factors, you can get a rough sense of whether you need a small (F2–F16), medium (F32–F64), or large (F128+) capacity.

Start with data and tools: Microsoft recommends a data-driven approach to capacity sizing. One strategy is to begin with a trial capacity or a small pay-as-you-go capacity, run your actual workloads, and measure the utilization. The Fabric Capacity Metrics App can be installed to monitor CPU utilization, memory, etc., and identify peaks. Over a representative period (say a busy week), observe how much of the 64 CU trial is used. If you find that utilization is peaking near 100% and throttling occurs, you likely need a larger SKU. If usage stays low (e.g. under 30% most of the time), you might get by with a smaller SKU in production or keep the same size with headroom.

Microsoft provides guidance to “start small and then gradually increase the size as necessary.” It’s often best to begin with a smaller capacity, see how it performs, and scale up if you approach limits. This avoids overcommitting to an expensive capacity that you might not fully use. With Fabric’s flexibility, scaling up (or down) capacity is relatively easy through Azure, and short-term overuse can be mitigated by bursting (discussed later).

Concretely, you would:

Measure consumption – perhaps use an F32 or F64 on a trial or month-to-month basis. Use the metrics app to check the CU utilization over time (Fabric measures consumption in 30-second intervals; multiply CUs by 30 to get CU-seconds per interval). Identify peak times and which workloads are driving them (the metrics app breaks down usage by item type, e.g. dataset vs Spark notebook).
Identify requirements – If your peak 30-second CU use is, say, 1500 CU-seconds, that’s roughly 50 CUs worth of power needed continuously in that peak period (since 30 sec * 50 CU = 1500). That suggests an F64 might be just enough (64 CUs) with some buffer, whereas an F32 (32 CUs) would throttle. On the other hand, if peaks only hit 200 CU-seconds (which is ~7 CUs needed), even an F8 could handle it.
Scale accordingly – Choose the SKU that covers your typical peak. It’s wise to allow some headroom, as constant 100% usage will lead to throttling. For instance, if your trial F64 shows occasional 80% spikes, moving to a permanent F64 could be fine thanks to bursting, but if you often hit 120%+ (bursting into future capacity), you should consider F128 or splitting workloads.

Microsoft has also provided a Fabric Capacity Estimator tool (on the Fabric website) which can help model capacity needs by inputting factors like number of users, dataset sizes, refresh rates, etc. This can be a starting point, but real usage metrics are more reliable.

Planning for growth and variability: Keep in mind future growth – if you expect user counts or data volumes to double in a year, factor that into capacity sizing (you may start at F64 and plan to increase to F128 later). Also consider workload timing. Some capacities experience distinct daily peaks (e.g., heavy ETL jobs at 2 AM, heavy report usage at 9 AM). Thanks to Fabric’s bursting and smoothing, a capacity can handle short peaks above its baseline, but if two peaks overlap or usage grows, you might need a bigger size or to schedule workloads to avoid contention. Where possible, schedule intensive background jobs (data refreshes, scoring runs) during off-peak hours for interactive use, to reduce concurrent strain on the capacity.

In summary, do your homework with a trial or pilot phase, leverage monitoring tools, and err on the side of starting a bit smaller – you can always scale up. Capacity planning helps you choose the right SKU and avoid slow queries or throttling while optimizing spend. And remember, you can have multiple capacities too; sometimes the answer is not one gigantic capacity, but two or three medium ones splitting different workloads (we’ll discuss this in governance).

Workload Management Across Fabric Experiences

One of the powerful aspects of Microsoft Fabric is that a single capacity can run a diverse set of workloads: Power BI reports, Spark notebooks, data pipelines, real-time KQL databases, AI models, etc. The capacity’s compute is shared by all these workloads. This section explains how to manage and balance different workloads on a capacity.

Unified capacity, multiple workloads: Fabric capacities are multi-tenant across workloads by design – you don’t buy separate capacity for Power BI vs Spark vs SQL. For example, an F64 capacity could simultaneously be handling a Power BI dataset refresh, a SQL warehouse query, and a Spark notebook execution. All consume from the same pool of 64 CUs. This unified model simplifies architecture: “It doesn’t matter if one user is using a Lakehouse, another is running notebooks, and a third is executing SQL – they can all share the same capacity.” All items in workspaces assigned to that capacity draw on its resources.

However, as an admin, you need to be mindful of resource contention: a very heavy job of one type can impact others. Fabric tries to manage this with an intelligent scheduler and the bursting/smoothing mechanism (which prioritizes interactive operations). Still, you should consider the nature of workloads when assigning them to capacities. Some guidance:

Power BI workloads: These include interactive report queries (DAX queries against datasets), dataset refreshes, dataflows, AI visuals, and paginated reports. In the capacity settings, admins have specific Power BI workload settings (for example, enabling the AI workload for cognitive services, or adjusting memory limits for datasets, similar to Power BI Premium settings). Ensure these are configured as needed – e.g., if you plan on using AI visualizations or AutoML in Power BI, make sure the AI workload is enabled on the capacity. Large semantic models (datasets) can consume a lot of memory; by default Fabric will manage their loading and eviction, but you may want to keep an eye on total model sizes relative to capacity. Paginated reports can be enabled if needed (they can be memory/CPU heavy during execution).
Data Engineering & Science (Spark): Fabric provides Spark engines for notebooks and job definitions. By default, when a Spark job runs, it uses a portion of the capacity’s cores. In fact, for Spark workloads, Microsoft has defined that each 1 CU = 2 Spark vCores of compute power. For example, an F32 (32 CU) capacity has 64 Spark vCores available to allocate across Spark clusters. These vCores are dynamically allocated to Spark sessions as users run notebooks or Spark jobs. Spark has a built-in concurrency limit per capacity: if all Spark vCores are in use, additional Spark jobs will queue until resources free up. As an admin, you can allow or disallow workspace admins from configuring Spark pool sizes on your capacity. If you enable it, power users might spin up large Spark executors that use many cores – beneficial for performance, but potentially starving other workloads. If Spark usage is causing contention, consider limiting the max Spark nodes or advising users to use moderate sizes. Notably, Fabric capacities support bursting for Spark as well – the system can utilize up to 3× the purchased Spark vCores temporarily to run more Spark tasks in parallel. This helps if you occasionally have many Spark jobs at once, but sustained overuse will still queue or throttle. For heavy Spark/ETL scenarios, you might dedicate a capacity just for that to isolate it from BI users.
Data Warehousing (SQL) and Real-Time Analytics (KQL): These workloads run SQL queries or KQL (Kusto Query Language) queries against data warehouses or real-time analytics databases. They consume CPU during query execution and memory for caching data. They are treated as background jobs if run via scheduled processes, or interactive if triggered by a user query. Fabric’s smoothing generally spreads out heavy background query loads over time. Nevertheless, a very expensive SQL query can momentarily spike CPU. As admin, ensure your capacity can handle peak query loads or advise your data teams to optimize queries (like proper indexing on warehouses) to avoid excessive load. There are not many specific toggles for SQL/KQL workloads in capacity settings (beyond enabling the Warehouse or Real-Time Analytics features which are on by default for F and P capacities).
OneLake and data movement: OneLake is the storage foundation for Fabric. While data storage itself doesn’t “consume” capacity CPU (storage is separate), activities like moving data (copying via pipelines), scanning large files, or loading data into a dataframe will use capacity compute. Data integration pipelines (if using Data Factory in Fabric) also run on the capacity. Keep an eye on any heavy data copy or transformation activities, as those are background tasks that could contribute to load.

Isolation and splitting workloads: If you find that certain workloads dominate the capacity, you might consider splitting them onto separate capacities. For instance, a common approach is to separate “self-service BI” and “data engineering” onto different capacities so that a big Spark job doesn’t slow down a business report refresh. Microsoft notes that provisioning multiple capacities can isolate compute for high-priority items or different usage patterns. You could have one capacity dedicated to Power BI content for executives (ensuring their reports are always snappy), and a second capacity for experimental data science projects. This kind of workload isolation via capacities is a governance decision (we will cover more in the governance section). The trade-off is cost and utilization – separate capacities ensure no interference, but you might end up with unused capacity in each if peaks happen at different times. A single capacity shared by all can be more cost-efficient if the workloads’ peak times are complementary.

Tenant settings delegation: In Fabric, some tenant-level settings (for example, certain Power BI tenant settings or workload features) can be delegated to the capacity level. This means you can override a global setting for a specific capacity. For instance, you might have a tenant setting that limits the maximum size of Power BI datasets for Pro workspaces, but for a capacity designated to a specific team, you allow larger models. In the capacity management settings, check the Delegated tenant settings section if you need to tweak such options for one capacity without affecting others. This feature allows granular control, such as enabling preview features or higher limits on a capacity used by advanced users while keeping defaults elsewhere.

Monitoring workload mix: Use the Capacity Metrics App or the Fabric Monitoring Hub to see what types of operations are consuming the most resources. The app can break down usage by item type (e.g., dataset vs Spark vs pipeline) to help identify if one category is the culprit for high utilization. If you notice, for example, that Spark jobs are consistently using the majority of CUs (perhaps visible as high background CPU), it may prompt you to adjust Spark configurations or move some Spark-heavy workspaces off to another capacity.

In summary, Fabric capacities are shared across all workload types, which is great for flexibility but requires good management to ensure balance. Leverage capacity settings to tune specific workloads (Power BI workload enabling, Spark pool limits, etc.), monitor the usage by workload type, and consider logical separation of workloads via multiple capacities if needed. Microsoft Fabric is designed so that the platform itself handles a lot of the balancing (through smoothing of background jobs), but administrator insight and control remain important to avoid any single workload overwhelming the rest.

Isolation and Security Boundaries

Microsoft Fabric capacities play a role in isolation at several levels – performance isolation, security isolation, and even geographic isolation. It’s important to understand what a capacity isolates (and what it doesn’t) within a Fabric tenant, and how to leverage capacities for governance or compliance.

Performance and resource isolation: A capacity is a unit of isolation for compute resources. Compute usage on one capacity does not affect other capacities in the tenant. If Capacity A is overloaded and throttling, it will not directly slow down Capacity B, since each has its own quota of CUs and separate throttling counters. This means you can confidently separate critical workloads by placing them in different capacities to ensure that heavy usage in one area (e.g., a dev/test environment) cannot degrade the performance of another (e.g., production reports). The Fabric platform applies throttling at the capacity scope, so even within the same tenant, one capacity “failing” (hitting limits) doesn’t spill over into another. As noted, there is an exception when it comes to cross-capacity data access: if a Fabric item in Capacity B is trying to query data that resides in Capacity A (for example, a dataset in B accessing a Lakehouse in A via OneLake), then the consuming capacity’s state is what matters for throttling that query. Generally, such cross-capacity consumption is not common except through shared storage like OneLake, and the compute to actually retrieve the data will be accounted to the consumer’s capacity.

Security and content isolation: It’s crucial to realize that a capacity is not a security boundary in terms of data access. All Fabric content security is governed by Entra ID (Azure AD) identities, roles, and workspace permissions, not by capacity. For example, just because Workspace X is on Capacity A and Workspace Y is on Capacity B does not mean users of X cannot access Y – if a user has the right permissions, they can access both. Capacities do not define who can see data; they define where it runs. So if you have sensitive data that only certain users should access, you still must rely on workspace-level security or separate Entra tenants, not merely separate capacities.

That said, capacities can assist with administrative isolation. You can delegate capacity admin roles so that different people manage different capacities. For instance, the finance IT team might be given admin rights to the “Finance Capacity” and they can control which workspaces go into it, without affecting other capacities. Additionally, you can control which workspaces are assigned to which capacity. By limiting capacity assignment rights (via the Contributor permissions setting on a capacity, which you can restrict to specific security groups), you ensure that, say, only approved workspaces/projects go into a certain capacity. This can be thought of as a soft isolation: e.g., only the HR team’s workspaces are placed in the HR capacity, keeping that compute “clean” from others.

Geographical and compliance isolation: If your organization has data residency requirements (for example, EU data must stay in EU datacenters, US data in US), capacities are a useful construct. When you create a capacity, you choose an Azure region for it. Workspaces on that capacity will allocate their Fabric resources in that region. This means you can satisfy multi-geo requirements by having separate capacities in each needed region and assigning workspaces accordingly. It isolates the data and compute to that geography. (Do note that OneLake has a global aspect, but it stores files/objects in the region of the capacity or the region you designate when creating the item. Check Fabric documentation on multi-geo support for details – company examples show deploying capacities per geography).

Tenant isolation: The ultimate isolation boundary is the Microsoft Entra tenant. Fabric capacities exist within a tenant. If you truly need completely separate environments (different user directories, no possibility of data or admin overlap), you would use separate Entra tenants (as was illustrated by Microsoft with one company using two tenants for different divisions). That, however, is a very high level of isolation usually only used in scenarios like M&A, extreme security separation, or multi-tenant services. Within one tenant, capacities give you isolation of compute but not identity.

Network isolation: As a side note, Fabric is a cloud SaaS, but it does provide features like Managed Virtual Networks for certain services (e.g., Data Factory pipelines or Synapse integration). These features allow you to restrict outbound data access to approved networks. While not directly related to capacity, these network security options can be enabled per workspace or capacity environment to ensure data does not leak to the public internet. If your organization requires network isolation, investigate Fabric’s managed VNet and private link support for the relevant workloads.

In summary, use capacities to create performance and administrative isolation within your tenant. Assign sensitive or mission-critical workloads their own capacity so they are shielded from others’ activity. But remember that all capacities under a tenant still share the same identity and security context; manage access via roles and perhaps use separate tenants if absolute isolation is needed. Also use capacities for geo-separation if needed by creating them in the appropriate regions.

Monitoring and Metrics

Continuous monitoring of capacity health and usage is vital to ensure you are getting the most out of your capacity and to preempt any issues like throttling. Microsoft Fabric provides several tools and metrics for capacity and workload monitoring.

Capacity Utilization Metrics: The primary tool for capacity admins is the Fabric Capacity Metrics App. This is a Power BI app (or report template) provided by Microsoft that connects to your capacity’s telemetry. It offers dashboards showing CPU utilization (%) over time, broken down by workloads and item types. You can see, for example, how much CPU was used by Spark vs datasets vs queries, etc., and identify the top consuming activities. The app typically looks at recent usage (last 7 days or 30 days) in 30-second intervals. Key visuals include the Utilization chart (showing how close to capacity limit you are) and possibly specific charts for interactive vs background load. As an admin, you should regularly review these metrics. Spikes to 100% indicate that you’re using all available CUs and likely bursting beyond capacity (which could lead to throttling if sustained). If you notice consistent high usage, it may be time to optimize or scale up.

Throttling indicators: Monitoring helps reveal if throttling is occurring. In Fabric, throttling can manifest as delays or failures of operations when the capacity is overextended. The metrics app might show when throttling events happen (e.g., a drop in throughput or specific events count). Additionally, some signals of throttling include user reports of slowness, refresh jobs taking longer or failing with capacity errors, or explicit error messages. Fabric may return an HTTP 429 or 430 error for certain overloaded scenarios (for example, Spark jobs will give a specific error code 430 if capacity is at max concurrency). As admin, watch for these in logs or user feedback.

Real-time monitoring: For current activity, the Monitoring Hub in the Fabric portal provides a view of running and recent operations across the tenant. You can filter by capacity to see what queries, refreshes, Spark jobs, etc., are happening “now” on a capacity and their status. This is useful if the capacity is suddenly slow – you can quickly check if a particular job is consuming a lot of resources. The Monitoring Hub will show active operations and those queued or delayed due to capacity.

Administrator Monitoring Workspace: Microsoft has an Admin Monitoring workspace (sometimes automatically available in the tenant or downloadable) that contains some pre-built reports showing usage and adoption metrics. This might include things like the most active workspaces, most refreshed datasets, etc., across capacities. It’s more about usage analytics, but it can help identify which teams or projects are heavily using the capacity.

External monitoring (Log Analytics): For more advanced needs, you can connect Fabric (especially Power BI aspects) to Azure Log Analytics to capture certain logs, and also collect logs from the On-premises Data Gateway (if you use one). Log Analytics might collect events like dataset refresh timings, query durations, etc. While not giving direct CPU usage, these can help correlate if failures coincide with high load times.

Key metrics to watch:

CPU Utilization %: How close to max CUs you are over time. Spikes to 100% sustained for multiple minutes are a red flag.
Memory: Particularly for Power BI (dataset memory consumption) – if you load multiple large models, ensure they fit in memory. The capacity metrics app shows memory usage per dataset. If near the limits, consider larger capacity or offloading seldom-used models.
Active operations count: Many concurrent operations (queries, jobs) can hint at saturation. For instance, if dozens of queries run simultaneously, you might hit limits even if each is light.
Throttle events: If the metrics indicate delayed or dropped operations, or the Fabric admin portal shows notifications of throttling, that’s a clear indicator.

Notifications: A best practice is to set up alerts/notifications when capacity usage is high. The Fabric capacity settings allow you to configure email notifications if utilization exceeds a certain threshold for a certain time. For example, you might set a notification if CPU stays over 80% for more than 5 minutes. This proactive alert can prompt you to intervene (perhaps scale up capacity or investigate the cause) before users notice major slowdowns.

SLA and user experience: Ultimately, the reason we monitor is to ensure a good user experience. Identify patterns like time of day spikes (maybe every Monday 9AM there’s a huge hit) and mitigate them (maybe by rescheduling some background tasks). Also track the performance of key reports or jobs over time – if they start slowing down, it could be capacity pressure.

In summary, leverage the available telemetry: Fabric Capacity Metrics App for historical trends, Monitoring Hub for real-time oversight, and set up alerts. By keeping a close eye on capacity metrics, you can catch issues early (such as creeping utilization that approaches limits) and take action – whether optimization, scaling, or spreading out the workload – to maintain smooth operations.

Autoscale and Bursting: Managing Peak Loads

One of the novel features of Microsoft Fabric’s capacity model is how it handles peak demands through bursting and smoothing, effectively providing an “autoscaling” experience within the capacity. In this section, we explain these concepts and how to plan for bursts, as well as other autoscale options (such as manual scale-out and Spark autoscaling).

Bursting and smoothing: Fabric is designed to deliver fast performance, even for short spikes in workload, without requiring you to permanently allocate capacity for the peak. It does this via bursting, which allows the capacity to temporarily use more compute than its provisioned CU limit when needed. In other words, your capacity can “burst” above 100% utilization for a short period so that intensive operations finish quickly. This is complemented by smoothing, which is the system’s way of averaging out that burst usage over time so that you’re not immediately penalized. Smoothing spreads the accounting of the consumed CUs over a longer window (5 minutes for interactive operations, up to 24 hours for background operations).

Put simply: “Bursting lets you use more power than you purchased (within a specific timeframe), and smoothing makes sure this over-use is under control by spreading its impact over time.”. For example, if you have an F64 capacity but a particular query needs the equivalent of 128 CUs for a few seconds, Fabric will allow it – the job will complete faster thanks to bursting beyond 64 CUs. Then, the “excess” usage is smoothed into subsequent minutes (meaning for some time after, the capacity’s available headroom is reduced as it pays back that borrowed compute). This mechanism gives an effect similar to short-term autoscaling: the capacity behaves as if it scaled itself up to handle a bursty load, then returns to normal.

Throttling and limits: Bursting is not infinite – it’s constrained by how much future capacity you can borrow via smoothing. Fabric has a throttling policy that kicks in if bursts go on too long or too high. The system tolerates using up to 10 minutes of future capacity with no throttling (this is like a built-in grace period). If you consume more than 10 minutes worth of CUs in advance, Fabric will start applying gentle throttling: interactive operations get a small 20-second delay on submission when between 10 and 60 minutes of capacity overage is consumed. This is phase 1 throttling – users might notice a slight delay but operations still run. If the capacity has consumed over an hour of future CUs (meaning it’s been running well above its quota for a sustained period), it enters phase 2 where interactive operations are rejected outright (while background jobs can still start). Finally, if over 24 hours of capacity is consumed (an extreme overload), all operations (interactive and background) are rejected until usage recovers. The table below summarizes these stages:

Excess Usage (beyond capacity)	System Behavior	Impact
Up to 10 minutes of future capacity	Overage protection (bursting)	No throttling; operations run normally.
10 – 60 minutes of overuse	Interactive delay	New interactive operations (user queries, etc.) are delayed ~20s in queue. Background jobs still start immediately.
60 minutes – 24 hours of overuse	Interactive rejection	New interactive operations are rejected (fail immediately). Background jobs continue to run/queue.
Over 24 hours of overuse	Full rejection	All new operations are rejected (both interactive and background) until the capacity “catches up”.

Table: Throttling thresholds in Fabric’s capacity model. Fabric bursts up to 10 minutes with no penalty. Beyond that, throttling escalates in stages to protect the system.

For most well-managed capacities, you ideally operate in the safe zone (under 10 minutes overage) most of the time. Occasional dips into the 10-60 minute range are fine (users might not even notice the minor delays). If you ever hit the 60+ minute range, that’s a sign the capacity is under-provisioned for the workload or a particular job is too heavy – it should prompt optimization or scaling.

Autoscaling options: Unlike some cloud services that spin up new instances automatically, Fabric’s approach to autoscale is primarily through bursting (which is automatic but time-limited). However, you do have some manual or semi-automatic options:

Manual scale-up/down: Because F-SKUs are purchased via Azure, you can scale the capacity resource to a different SKU on the fly (e.g., from F64 to F128 for a day, then back down). If you have a reserved base (like an F64 reserved instance), you can temporarily scale up using pay-as-you-go to a larger SKU to handle a surge. For instance, an admin might anticipate heavy year-end processing and raise the capacity for that week. Microsoft will bill the overage at the hourly rate for the higher SKU during that period. This is a proactive autoscale you perform as needed. It’s not automatic, but you could script it or use Azure Automation/Logic Apps to trigger scaling based on metrics (there are solutions shared by the community to do exactly this).
Scale-out via additional capacity: Another approach if facing continual heavy load is to add another capacity and redistribute work. For example, if one capacity is maxed out daily, you could purchase a second capacity and move some workspaces to it (spreading the load). This isn’t “autoscale” per se (since it’s a static split unless you later combine them), but it’s a way to increase total resources. Because Fabric charges by capacity usage, two F64s cost the same as one F128 in pay-go terms, so cost isn’t a downside, and you gain isolation benefits.
Spark autoscaling within capacity: For Spark jobs, Fabric allows configuration of auto-scaling Spark pools (the number of executors can scale between a min and max) which optimizes resource usage for Spark jobs. This feature, however, operates within the capacity’s limits – it won’t exceed the total cores available unless bursting provides headroom. It simply means a Spark job will request more nodes if needed and free them when done, up to what the capacity can supply. There is also a preview feature called Spark Autoscale Billing which, if enabled, can offload Spark jobs to a completely separate serverless pool billed independently. That effectively bypasses the capacity for Spark (useful if you don’t want Spark competing with your capacity at all), but since it’s a preview and separate billing, most admins will primarily consider it if Spark is a huge part of their usage and they want a truly elastic experience.
Surge Protection: Microsoft introduced surge protection (currently in preview) for Fabric capacities, which is a setting that limits the total amount of background compute that can run when the capacity is under strain. If enabled, when interactive activities surge, the system will start rejecting background jobs preemptively so that interactive users aren’t as affected. This doesn’t give more capacity, but it triages usage to favor user-driven queries. It’s a protective throttle that helps the capacity recover faster from a spike. As an admin, if you have critical interactive workloads, you might turn this on to ensure responsiveness (at the cost of some background tasks failing and needing retry).

Clearing overuse: If your capacity does get into a heavily throttled state (e.g., many hours of overuse accumulated), one way to reset is to pause and resume the capacity. Pausing essentially stops the capacity (dropping all running tasks) and when resumed, it starts fresh with no prior overhang – but note, any un-smoothed burst usage gets immediately charged at that point. In effect, pausing is like paying off your debt instantly (since when the capacity is off, you can’t “pay back” with idle time, so you are billed for the overage). This is a drastic action (users will be disrupted by a pause), so it’s not a routine solution, but in extreme cases an admin might do this during off hours to clear a badly throttled capacity. Typically, optimizing the workload or scaling out is preferable to hitting this situation.

Design for bursts: Thanks to bursting, you don’t have to size your capacity for the absolute peak if it’s short-lived. Plan for the daily average or slightly above instead of the worst-case peak. Bursting will handle the occasional spike that is, say, 2-3× your normal usage for a few minutes. For example, if your daily work typically uses ~50 CUs but a big refresh at noon spikes to 150 CUs for 1 minute, an F64 capacity can still handle it by bursting (150/64 = ~2.3x for one minute, which smoothing can cover over the next several minutes). This saves cost because you avoid buying an F128 just for that one minute. The system’s smoothing will amortize that one minute over the next 5-10 minutes of capacity. However, if those spikes start lasting 30 minutes or happening every hour, then you do effectively need a larger capacity or you’ll degrade performance.

In conclusion, Fabric’s bursting and smoothing provide a built-in cushion for peaks, acting as an automatic short-term autoscale. As an admin, you should still keep an eye on how often and how deeply you burst (via metrics), and use true scaling strategies (manual scale-up or adding capacity) if needed for sustained load. Also take advantage of features like Spark pool autoscaling and surge protection to further tailor how your capacity handles variable workloads. The combination of these tools ensures you can maintain performance without over-provisioning for rare peaks, achieving a cost-effective balance.

Governance and Best Practices for Capacity Assignment

Managing capacities is not just about the hardware and metrics – it also involves governance: deciding how capacities are used within your organization, which workspaces go where, and enforcing policies to ensure efficient and secure usage. Here are best practices and guidelines for capacity and tenant admins when assigning and governing capacities.

1. Organize capacities by function, priority, or domain: It often makes sense to allocate different capacities for different purposes. For example, you might have a capacity dedicated to production BI content (high priority reports for executives) and another for self-service and development work. This way, heavy experimentation in the dev capacity cannot interfere with the polished dashboards in prod. Microsoft gives an example of using separate capacities so that executives’ reports live on their own capacity for guaranteed performance. Some common splits are:

By department or business unit: e.g., Finance has a capacity, Marketing has another – helpful if departments have very different usage patterns or need cost accountability.
By workload type: e.g., one capacity for all Power BI reports, another for data engineering pipelines and science projects. This can minimize cross-workload contention.
By environment: e.g., one for Production, one for Test/QA, one for Development. This aligns with software lifecycle management.
By geography: as discussed, capacities by region (EMEA vs Americas, etc.) if data residency or local performance is needed.

Having multiple capacities incurs overhead (you must monitor and manage each), so don’t over-segment without reason. But a thoughtful breakdown can improve both performance isolation and clarity in who “owns” the capacity usage.

2. Control workspace assignments: Not every workspace needs to be on a dedicated capacity. Some content can live in the shared (free) capacity if it doesn’t need premium features. As an admin, you should have a process for requesting capacity assignment. You might require that a workspace meet certain criteria (e.g., it’s for a project that requires larger dataset sizes or will have broad distribution) before assigning it to the premium capacity. This prevents trivial or personal projects from consuming expensive capacity resources. In Fabric, you can restrict the ability to assign a workspace to a capacity by using Capacity Contributor permissions. By default, it might allow the whole organization, but you can switch it to specific users or groups. A best practice is to designate a few power users or a governance board that can add workspaces to the capacity, rather than leaving it open to all.

Also consider using the “Preferred capacity for My workspace” setting carefully. Fabric allows you to route user personal workspaces (My Workspaces) to a capacity. While this could utilize capacity for personal analyses, it can also easily overwhelm a capacity if many users start doing heavy work in their My Workspace. Many organizations leave My Workspaces on shared capacity (which requires those users to have Pro licenses for any Power BI content in them) and only put team or app workspaces on the Fabric capacities.

3. Enforce capacity governance policies: There may be tenant-level settings you want to enforce or loosen per capacity. For instance, perhaps in a special capacity for data science you allow higher memory per dataset or allow custom Visualizations that are otherwise disabled. Use the delegated tenant settings feature to override settings on specific capacities as needed. Another example: you might want to disable certain preview features or enforce specific data export rules in a production capacity for security, while allowing them in a dev capacity.

4. Educate workspace owners: Ensure that those who have their workspace on a capacity know the “dos and don’ts.” They should understand that it’s a shared resource – e.g., a badly written query or an extremely large dataset refresh can impact others. Encourage best practices like scheduling heavy refreshes during off-peak times, enabling incremental refresh for large datasets (to reduce refresh load), optimizing DAX and SQL queries, and so on. Capacity admins can provide guidelines or even help review content that will reside on the capacity.

5. Leverage monitoring for governance: Keep track of which workspaces or projects are consuming the most capacity. If one workspace is monopolizing resources (you can see this in metrics, which identify top items), you might decide to move that workspace to its own capacity or address the inefficiencies. You can even implement an internal chargeback or at least show departments how much capacity they consumed to promote accountability.

6. Plan for lifecycle and scaling: Governance also means planning how to scale or reassign as needs change. If a particular capacity is consistently at high load due to growth of a project, have a strategy to either scale that capacity or redistribute workspaces. For example, you might spin up a new capacity and migrate some workspaces to it (admins can change a workspace’s capacity assignment easily in the portal). Microsoft notes you can “scale out” by moving workspaces to spread workload, which is essentially a governance action as much as a performance one. Also, when projects are retired or become inactive, don’t forget to remove their workspaces from capacity (or even delete them) so they don’t unknowingly consume resources with forgotten scheduled operations.

7. Security considerations: While capacity doesn’t enforce security, you can use capacity assignment as part of a trust boundary in some cases. For instance, if you have a workspace with highly sensitive data, you might decide it should run on a capacity that only that team’s admins control (to reduce even the perception of others possibly affecting it). Also, if needed, capacities can be tied to different encryption keys (Power BI allows BYOK for Premium capacities) – check if Fabric supports BYOK per capacity if that’s a requirement.

8. Documentation and communication: Treat your capacities as critical infrastructure. Document which workspaces are on which capacity, what the capacity sizes are, and any rules associated with them. Communicate to your user community about how to request space on a capacity, what the expectations are (like “if you are on the shared capacity, you get only Pro features; if you need Fabric features, request placement on an F SKU” or vice versa). Clear guidelines will reduce ad-hoc and potentially improper use of the capacities.

In essence, governing capacities is about balancing freedom and control. You want teams to benefit from the power of capacities, but with oversight to ensure no one abuses or unknowingly harms the shared environment. Using multiple capacities for natural boundaries (dept, env, workload) and controlling assignments are key techniques. As a best practice, start somewhat centralized (maybe one capacity for the whole org in Fabric’s early days) and then segment as you identify clear needs to do so (such as a particular group needing isolation or a certain region needing its own). This way you keep things manageable and only introduce complexity when justified.

Cost Optimization Strategies

Managing cost is a major part of capacity administration, since dedicated capacity represents a significant investment. Fortunately, Microsoft Fabric offers several ways to optimize costs while meeting performance needs. Here are strategies to consider:

1. Use Pay-as-you-go wisely (pause when idle): F-SKUs on Azure are billed on a per-second basis (with a 1-minute minimum) whenever the capacity is running. This means if you don’t need the capacity 24/7, you can pause it to stop charges. For example, if your analytics workloads are mostly 9am-5pm on weekdays, you could script the capacity to pause at night and on weekends. You only pay for the hours it’s actually on. An F8 capacity left running 24/7 costs roughly $1,200 per month, but if you paused it outside of an 8-hour workday, the cost could drop to a third of that (plus no charge on weekends). Always assess your usage patterns – some organizations run critical reports around the clock, but many could save by pausing during predictable downtime. The Fabric admin portal allows pause/resume, and Azure Automation or Logic Apps can schedule it. Just ensure no important refresh or user query is expected during the paused window.

2. Right-size the SKU (avoid over-provisioning): It might be tempting to get a very large capacity “just in case,” but unused capacity is money wasted. Thanks to bursting, you can usually size for slightly above your average load, not the absolute peak. Monitor utilization and if you see your capacity is consistently under 30% utilized, that’s a sign you might scale down to a smaller SKU and save costs (unless you’re expecting growth or deliberately keeping headroom). The granular SKU options (F2, F4, F8, etc.) let you fine-tune. For instance, if F64 is too much and F32 occasionally struggles, an F48 would be ideal – while not an official SKU, you could achieve an “F48” by using reserved capacity units (more on that below) to split or by alternating scheduling (though that’s complex). Generally, stick to SKUs but choose the lowest one that meets requirements with maybe some buffer.

3. Reserved capacity (annual commitment) for lower rates: Pay-as-you-go is flexible but at a higher unit price. Microsoft has indicated and demonstrated that reserved instance pricing for F-SKUs brings significant cost savings (on the order of ~40% cheaper for a 1-year commitment). For example, an F8 costs around €1188/month pay-go, but ~€706/month with a 1-year reservation. If you know you will need a capacity continuously for a long period, consider switching to a reserved model to reduce cost. Importantly, when you reserve, you are reserving a certain number of capacity units, not locking into a specific SKU size. So you could reserve 64 CUs (the equivalent of F64) but choose to run two F32 capacities or one F64 – as long as total CUs in use ≤64, it’s covered by your reservation. This allows flexibility in how you deploy those reserved resources (multiple smaller capacities vs one big one). Also, with reservation, you can still scale up beyond your reserved amount and just pay the excess at pay-go rates. For instance, you reserve F8 (8 CUs) but occasionally scale to F16 for a day – you’d pay the 8 extra CUs at pay-go just for that time. This hybrid approach ensures you get savings on your baseline usage and only pay premium for surges.

4. Monitor and optimize workload costs: Cost optimization can also mean making workloads more efficient so they consume fewer CUs. Encourage good practices like using smaller dataset refresh intervals (don’t over-refresh), turning off refresh for datasets not in use, archiving or deleting old large datasets, using incremental refresh, etc. For Spark, make sure jobs are not running with unnecessarily large clusters idle (auto-terminate them when done, which Fabric usually handles). If using the serverless Spark billing preview, weigh its cost (it might be cheaper if your Spark usage is sporadic, versus holding capacity for it).

5. Mix license models for end-users: Not everyone in your organization needs to use the capacity. You can have a hybrid of Premium capacity and Premium Per User. For example, perhaps you buy a small capacity for critical shared content, but for many other smaller projects, you let teams use PPU licenses on the shared (free) capacity. This way you’re not putting everything on the capacity. As mentioned, PPU is cost effective up to a point (if many users need it, capacity becomes cheaper). You might say: content intended for large audiences goes on capacity (so free users can consume it), whereas content for small teams stays with PPU. Such a strategy can yield substantial savings. It also provides a path for scaling: as a particular report or solution becomes widely adopted, you can move it from the PPU world to the capacity.

6. Utilize lower-tier SKUs and scale out: If cost is a concern and ultra-high performance isn’t required, you could opt for multiple smaller capacities instead of one large one. For example, two F32 capacities might be cheaper in some scenarios than one F64 if you can pause them independently or if you got a deal on smaller ones. That said, Microsoft’s pricing is generally linear with CUs, so two F32 should cost roughly the same as one F64 in pay-go. The advantage would be if you can pause one of them for periods when not needed. Be mindful though: capacities below F64 won’t allow free user report viewing, which could force Pro licenses and shift cost elsewhere.

7. Keep an eye on OneLake storage costs: Fabric capacity covers compute. Storage in OneLake is billed separately (at a certain rate per GB per month). Microsoft’s current OneLake storage cost (~$0.022 per GB/month in one region example) is relatively low, but if you are landing terabytes of data, it will add up. It usually won’t overshadow compute costs, but from a governance perspective, try to clean up unused data (e.g., old versioned data, intermediate files) to avoid an ever-growing storage bill. Also, data egress (moving data out of the region) could have costs, but if staying within Fabric likely not an issue.

8. Periodically review usage and adjust: Cost optimization is not a one-time set-and-forget. Each quarter or so, review your capacity’s utilization and cost. Are you paying for a large capacity that’s mostly idle? Scale it down or share it with more workloads (to get more value out of it). Conversely, if you’re consistently hitting the limits and had to enable frequent autoscale (pay-go overages), maybe committing to a higher base SKU could be more economical. Remember, if you went with a reserved instance, you already paid upfront – ensure you are using what you paid for. If you reserved an F64 but only ever use 30 CUs, you might repurpose some of those CUs to another capacity (e.g., split into F32 + F32) so that more projects can utilize the prepaid capacity.

9. Leverage free/trial features: Make full use of the 60-day Fabric trial capacity before purchasing. It’s free compute time – treat it as such to test heavy scenarios and get sizing estimates without incurring cost. Also, if certain features remain free or included (like some amount of AI functions or some small dataset sizes not counting, etc.), be aware and use them.

10. Watch for Microsoft licensing changes or offers: Microsoft’s cloud services pricing can evolve. For instance, the deprecation of P-SKUs might come with incentives or migration discounts to F-SKUs. There could be offers for multi-year commitments. Stay informed via the Fabric blog or your Microsoft rep for any cost-saving opportunities.

In practice, many organizations find that moving to Fabric F-SKUs saved money compared to the old P-SKUs, if they manage the capacity actively (pausing when not needed, etc.). One user noted Fabric capacity is “significantly cheaper than Power BI Premium capacity” if you utilize the flexible billing. But this is only true if you take advantage of the flexibility – otherwise pay-go could actually cost more than an annual P-SKU if left running 24/7 at high rate. Thus, the onus is on the admin to optimize runtime.

By combining these strategies – dynamic scaling, reserved discounts, license mixing, and efficient usage – you can achieve an optimal balance of performance and cost. The result should be that your organization pays for exactly the level of analytics power it needs, and not a penny more, while still delivering a good user experience.

Real-World Use Cases and Scenario-Based Recommendations

To tie everything together, let’s consider a few typical scenarios and how one might approach capacity management in each:

Scenario 1: Small Business or Team Starting with Fabric
A 50-person company with a small data team is adopting Fabric primarily for Power BI reports and a few dataflows.
Approach: Begin with the Fabric Trial (F64) to pilot your content. Likely an F64 provides ample power for 50 users. During the trial, monitor usage – it might show that even an F32 would suffice if usage is light. Since 50 users is below the ~250 threshold, one option after trial is to use Premium Per User (PPU) licenses instead of buying capacity (each power user gets PPU so they have premium features, and content runs on shared capacity). This could be cheaper initially. However, if the plan is to roll out company-wide reports that everyone consumes, a capacity is beneficial so that even free users can view. In that case, consider purchasing a small F SKU on pay-go, like F32 or F64 depending on trial results. Use pay-as-you-go and pause it overnight to save money. With an F32 (which is below Premium threshold), remember that viewers will need Pro licenses – if you want truly all 50 users (including some without Pro) to access, go with at least F64. Given cost, you might decide on PPU for all 50 instead of F64, which could be more economical until the user base or needs grow. Keep governance light but educate the small team on not doing extremely heavy tasks that might require bigger capacity. Likely one capacity is enough; no need to split by departments since the org is small.

Scenario 2: Mid-size Enterprise focusing on Enterprise BI
A 1000-person company has a BI Center of Excellence that will use Fabric primarily for Power BI (reports & datasets), replacing a P1 Premium. Minimal use of Spark or advanced workloads initially.
Approach: They likely need a capacity that allows free user consumption of reports – so F64 or larger. Given they had a P1, F64 is the equivalent. Use F64 reserved for a year to save about 40% cost over monthly, since they know they need it continuously. Monitor usage: if adoption grows (more reports, bigger datasets), they should watch if utilization nears limits. Perhaps they’ll consider scaling to F128 in the future. In terms of governance, set up one primary capacity for Production BI content. Perhaps also spin up a smaller F32 trial or dev capacity for development and testing of reports, so heavy model refreshes in dev don’t impact prod. The dev capacity could even be paused except during working hours to save cost. For user licensing, since content on F64 can be viewed by free users, they can give all consumers just Fabric Free licenses. Only content creators (maybe ~50 BI developers) need Pro licenses. Enforce that only the BI team can assign workspaces to the production capacity (so random workspaces don’t sneak in). Use the metrics app to ensure no one workspace is hogging resources; if a particular department’s content is too heavy, maybe allocate them a dedicated capacity (e.g. buy another F64 for that department if justified).

Scenario 3: Data Science and Engineering Focus
A tech company with 200 data scientists and engineers plans to use Fabric for big data processing, machine learning, and some reporting. They expect heavy Spark usage and big warehouses; less focus on broad report consumption.
Approach: Since their usage is compute-heavy but not necessarily thousands of report viewers, they might prioritize raw power over Premium distribution. Possibly they could start with an F128 or F256, even if many of their users have Pro licenses anyway (so free-viewer capability isn’t the concern, capacity for compute is). They might split capacities by function: one “AI/Engineering” capacity and one “BI Reporting” capacity. The AI one might be large (to handle Spark clusters, etc.), and the BI one can be smaller if report usage is limited to internal teams with Pro. If cost is a concern, they could try an alternative: keep one moderate capacity and use Spark autoscale billing (serverless Spark) for big ML jobs so that those jobs don’t eat capacity – essentially offloading big ML to Azure Databricks or Spark outside of Fabric. But if they want everything in Fabric, an ample capacity with bursting will handle a lot. They should use Spark pool auto-scaling and perhaps set conservative defaults to avoid any single user grabbing too many cores. Monitor concurrency – if Spark jobs queue often, maybe increase capacity or encourage using pipeline scheduling to queue non-urgent jobs. For cost, they might run the capacity 24/7 if pipelines run round the clock. Still, if nights are quiet, pause then. Because these users are technical, requiring them to have Pro or PPU is fine; they may not need to enable free user access at all. If they do produce some dashboards for a wider audience, those could be on a smaller separate capacity (or they give those viewers PPU licenses). Overall, ensure the capacity is in a region close to the data lake for performance, and consider enabling private networking since they likely deal with secure data.

Scenario 4: Large Enterprise, Multiple Departments
A global enterprise with several divisions, all adopting Fabric for different projects – some heavy BI, some data warehousing, some real-time analytics.
Approach: This calls for a multi-capacity strategy. They might purchase a pool of capacity units (e.g., 500 CUs reserved) and then split into multiple capacities: e.g., an F128 for Division A, F128 for Division B, F64 for Division C, etc., up to the 500 CU total. This way each division can manage its own without impacting others, and the company benefits from a bulk reserved discount across all. They should designate a capacity admin for each to manage assignments. They should also be mindful of region – maybe an F128 in EU for the European teams, another in US for American teams. Use naming conventions for capacities (e.g., “Fabric_CAP_EU_Prod”, “Fabric_CAP_US_Marketing”). They might also keep one smaller capacity as a “sandbox” environment where any employee can try Fabric (kind of like a community capacity) – that one might be monitored and reset often. Cost-wise, they will want reserved instances for such scale and possibly 3-year commitments if confident (those might bring even greater discounts in the future). Regular reviews might reveal one division not using their full capacity – they could decide to resize that down and reallocate CUs to another that needs more (taking advantage of the flexibility that reserved CUs are not tied to one capacity shape). The governance here is crucial: a central team should set overall policies (like what content must be where, and ensure compliance and security are uniform), while delegating day-to-day to local admins.

Scenario 5: External Facing Embedded Analytics
A software vendor wants to use Fabric to embed Power BI reports in their SaaS product for their external customers.
Approach: This scenario historically used A-SKUs or EM-SKUs. With Fabric, they have options: they could use an F-SKU which also supports embedding, or stick with A-SKU if they don’t need Fabric features. If they only care about embedding reports and want to minimize cost, an A4 (equivalent to F64) might be slightly cheaper if they don’t need the rest of Fabric (plus A4 can be paused too). However, if they think of using Fabric’s dataflows or other features to prep data, going with an F-SKU might be more future-proof. Assuming they choose an F-SKU, they likely need at least F8 or F16 to start (depending on user load) because EM/A SKUs start at that scale for embedding anyway. They can scale as their customer base grows. They will treat this capacity as dedicated to their application. They should isolate it from internal corporate capacities. Cost optimization here is to scale with demand: e.g., scale up during business hours if that’s when customers use the app, and scale down at night or pause if no one accesses at 2 AM. But since external users might be worldwide, they might run it constantly and possibly consider multi-geo capacities to serve different regions for latency. They must also handle licensing properly: external users viewing embedded content do not need Pro licenses; the capacity covers that. So the capacity cost is directly related to usage the vendor expects (if many concurrent external users, need higher SKU). Monitoring usage patterns (peak concurrent users driving CPU) will guide scaling and cost.

These scenarios highlight that capacity management is flexible – you adapt the strategy to your specific needs and usage patterns. There is no one-size-fits-all, but the principles remain consistent: use data to make decisions, isolate where necessary, and take advantage of Fabric’s elasticity to optimize both performance and cost.

Conclusion

Microsoft Fabric capacities are a powerful enabler for organizational analytics at scale. By understanding the different capacity types, how to license and size them, and how Fabric allocates resources across workloads, administrators can ensure their users get a fast, seamless experience. We covered how to plan capacity size (using tools and trial runs), how to manage mixed workloads on a shared capacity, and how Fabric’s unique bursting and smoothing capabilities help handle peaks without constant overspending. We also delved into monitoring techniques to keep an eye on capacity health and discussed governance practices to allocate capacity resources wisely among teams and projects. Finally, we explored ways to optimize costs – from pausing unused capacity to leveraging reserved pricing and choosing the right licensing mix.

In essence, effective capacity management in Fabric requires a balance of technical tuning and organizational policy. Administrators should collaborate with business users and developers alike: optimizing queries and models (to reduce load), scheduling workloads smartly, and scaling infrastructure when needed. With careful management, a Fabric capacity can serve a wide array of analytics needs while maintaining strong performance and staying within budget. We encourage new capacity admins to start small, iterate, and use the rich monitoring data available – over time, you will develop an intuition for your organization’s usage patterns and how to adjust capacity to match. Microsoft Fabric’s capacities, when well-managed, will provide a robust, flexible foundation for your data-driven enterprise, allowing you to unlock insights without worrying that resources will be the bottleneck. Happy capacity managing!

Sources:

Microsoft Fabric documentation – Concepts and Licenses, Microsoft Learn
Microsoft Fabric documentation – Plan your capacity size, Microsoft Learn
Microsoft Fabric documentation – Evaluate and optimize your capacity, Microsoft Learn
Microsoft Fabric documentation – Capacity throttling policy, Microsoft Learn
Data – Marc blog – Power BI and Fabric capacities: Cost structure, June 2024
Microsoft Fabric documentation – Fabric trial license, Microsoft Learn
Microsoft Fabric documentation – Capacity settings (admin), Microsoft Learn
Dataroots.io – Fabric pricing, billing, and autoscaling, 2023
Medium – Adrian B. – Fabric Capacity Management 101, 2023
Microsoft Fabric documentation – Spark concurrency limits, Microsoft Learn
Microsoft Fabric community – Fabric trial capacity limits, 2023 (trial is 60 days)
Microsoft Fabric documentation – Throttling stages, Microsoft Learn

Download PDF copy – Microsoft Fabric Capacity Management_ A Comprehensive Guide for Administrators.pdf

Why Notebook Snapshots in Microsoft Fabric Are a Debugging Gamechanger—No, Seriously!

If you’ve ever experienced the sheer agony of debugging notebooks—those chaotic, tangled webs of code, markdown, and occasional tears—you’re about to understand exactly why Notebook Snapshots in Microsoft Fabric aren’t just helpful, they’re borderline miraculous. Imagine the emotional rollercoaster of meticulously crafting a beautifully intricate notebook, only to watch it crumble into cryptic errors and obscure stack traces with no clear clue of what went wrong, when, or how. Sound familiar? Welcome to notebook life.

But fear not, weary debugger. Microsoft Fabric is finally here to rescue your productivity—and possibly your sanity—through the absolute genius of Notebook Snapshots.

Let’s Set the Scene: The Notebook Debugging Nightmare

To fully appreciate the brilliance behind Notebook Snapshots, let’s first vividly recall the horrors of debugging notebooks without them.

Step 1: You enthusiastically write and run a series of notebook cells. Everything looks fine—until, mysteriously, it doesn’t.

Step 2: A wild error appears! Frantically, you scroll back up, scratching your head and questioning your life choices. Was it Cell 17, or perhaps Cell 43? Who knows at this point?

Step 3: You begin the tiresome quest of restarting the kernel, selectively re-running cells, attempting to recreate that perfect storm of chaos that birthed the bug. Hours pass, frustration mounts, coffee runs out—disaster ensues.

Sound familiar? Of course it does, we’ve all been there.

Enter Notebook Snapshots: The Hero We Didn’t Know We Needed

Notebook Snapshots in Microsoft Fabric aren’t simply another fancy “nice-to-have” feature; they’re an absolute lifeline for notebook developers. Essentially, Notebook Snapshots capture a complete state of your notebook at a specific point in time—code, outputs, errors, and all. They let you replay and meticulously analyze each step, preserving context like never before.

Think of them as your notebook’s personal rewind button: a time-traveling companion ready to transport you back to that critical moment when everything broke, but your optimism was still intact.

But Why Exactly is This Such a Gamechanger?

Great question—let’s get granular.

1. Precise State Preservation: Say Goodbye to Guesswork

The magic of Notebook Snapshots is in their precision. No more wondering which cell went rogue. Snapshots save the exact state of your notebook’s cells, outputs, variables, and even intermediate data transformations. This precision ensures that you can literally “rewind” and step through execution like you’re binging your favorite Netflix series. Missed something crucial? No worries, just rewind.

Benefit: You know exactly what the state was before disaster struck. Debugging transforms from vague guesswork to precise, surgical analysis. You’re no longer stumbling in the dark—you’re debugging in 4K clarity.

2. Faster Issue Replication: Less Coffee, More Debugging

Remember spending hours trying to reproduce obscure bugs that vanished into thin air the moment someone else was watching? Notebook Snapshots eliminate that drama. They capture the bug in action, making it infinitely easier to replicate, analyze, and ultimately squash.

Benefit: Debugging time shrinks dramatically. Your colleagues are impressed, your boss is delighted, and your coffee machine finally gets a break.

3. Collaboration Boost: Debug Together, Thrive Together

Notebook Snapshots enable teams to share exact notebook states effortlessly. Imagine sending your team a link that perfectly encapsulates your debugging context. No lengthy explanations needed, no screenshots required, and definitely no more awkward Slack messages like, “Ummm… it was working on my machine?”

Benefit: Everyone stays synchronized. Collective debugging becomes simple, fast, and—dare we say it—pleasant.

4. Historical Clarity: The Gift of Hindsight

Snapshots build a rich debugging history. You can examine multiple snapshots over time, comparing exactly how your notebook evolved and where problems emerged. You’re no longer relying on vague memory or frantic notebook archaeology.

Benefit: Clearer, smarter decision-making. You become a debugging detective with an archive of evidence at your fingertips.

5. Confidence Boosting: Fearless Experimentation

Knowing you have snapshots lets you innovate fearlessly. Go ahead—experiment wildly! Change parameters, test edge-cases, break things on purpose (just for fun)—because you can always rewind to a known-good state instantly.

Benefit: Debugging stops being intimidating. It becomes fun, bold, and explorative.

A Practical Example: Notebook Snapshots in Action

Imagine you’re exploring a complex data pipeline in a notebook:

You load and transform data.
You run a model.
Suddenly, disaster: a cryptic Python exception mocks you cruelly.

Normally, you’d have to painstakingly retrace your steps. With Microsoft Fabric Notebook Snapshots, the workflow is much simpler:

Instantly snapshot the notebook at the exact moment the error occurs.
Replay each cell execution leading to the error.
Examine exactly how data changed between steps—no guessing, just facts.
Swiftly isolate the issue, correct the bug, and move on with your life.

Just like that, you’ve gone from notebook-induced stress to complete debugging Zen.

A Bit of Sarcastic Humor for Good Measure

Honestly, if you’re still debugging notebooks without snapshots, it’s a bit like insisting on traveling by horse when teleportation exists. Sure, horses are charmingly nostalgic—but teleportation (aka Notebook Snapshots) is clearly superior, faster, and way less messy.

Or, put differently: debugging notebooks without snapshots in 2025 is like choosing VHS tapes over streaming. Sure, the retro vibes might be fun once—but let’s be honest, who wants to rewind tapes manually when you can simply click and replay?

Wrapping It All Up: Notebooks Just Got a Whole Lot Easier

In short, Notebook Snapshots in Microsoft Fabric aren’t merely a convenience—they fundamentally redefine how we approach notebook debugging. They shift the entire paradigm from guesswork and frustration toward clarity, precision, and confident experimentation.

Notebook developers everywhere can finally rejoice: your debugging nightmares are officially canceled.

Thanks, Microsoft Fabric—you’re genuinely a gamechanger.

This post was written with help from ChatGPT

Advanced Data Analysis with Power BI: Leveraging Statistical Functions

Microsoft Power BI is a powerful tool that helps businesses and individuals transform their raw data into actionable insights. One of its most powerful features is the ability to perform advanced data analysis through its comprehensive suite of statistical functions. This blog post will delve into using these functions effectively, giving you a better understanding of your data, and improving your decision-making process.

Let’s start by understanding Power BI a bit better.

Power BI: A Brief Overview

Power BI is a business analytics tool suite that provides interactive visualizations with self-service business intelligence capabilities. Users can create reports and dashboards without any technical knowledge, making it easier for everyone to understand the data. Power BI offers data extraction from multiple heterogeneous data sources, including Excel files, SQL Server, and cloud-based sources like Azure SQL Database, Salesforce, etc.

Leveraging Statistical Functions in Power BI

Power BI is capable of conducting high-level statistical analysis thanks to DAX (Data Analysis Expressions) – a library of functions used in Power BI, Analysis Services, and Power Pivot in Excel. DAX includes a variety of functions such as aggregation functions, date and time functions, mathematical functions, statistical functions, and more.

To start with, we will discuss some of the commonly used statistical functions and how to apply them.

1. AVERAGE and AVERAGEA

The AVERAGE function calculates the mean of a column of numbers. AVERAGEA does the same, but it evaluates TRUE and FALSE as 1 and 0, respectively.

Here’s an example:

AVERAGE ( Sales[Quantity] )
AVERAGEA ( Sales[Quantity] )

The first expression calculates the average of the Quantity column in the Sales table, ignoring any TRUE or FALSE values. The second expression, however, will include these boolean values.

2. COUNT and COUNTA

COUNT function counts the number of rows in a column that contain a number or an expression that evaluates to a number. On the other hand, COUNTA counts the number of rows in a column that are not blank.

COUNT ( Sales[Quantity] )
COUNTA ( Sales[Product] )

The first expression counts the number of rows in the Quantity column of the Sales table that contains a number. The second one counts the number of non-blank rows in the Product column of the Sales table.

3. MIN and MAX

MIN and MAX return the smallest and largest numbers in a numeric dataset, respectively.

MIN ( Sales[Price] )
MAX ( Sales[Price] )

The first expression finds the smallest price in the Price column of the Sales table. The second expression returns the highest price.

4. STDEV.P and STDEV.S

STDEV.P function calculates standard deviation based on the entire population given as arguments. STDEV.S calculates standard deviation based on a sample.

STDEV.P ( Sales[Price] )
STDEV.S ( Sales[Price] )

The first expression calculates the standard deviation of the entire population of prices in the Price column of the Sales table. The second calculates the standard deviation based on a sample.

Implementing Statistical Functions in Power BI: An Example

Let’s demonstrate the implementation of these statistical functions in Power BI with a hypothetical data set. Let’s assume we have a “Sales” table with the following columns: OrderID, Product, Quantity, and Price.

To calculate the average quantity sold, we would create a new measure:

Average Quantity = AVERAGE ( Sales[Quantity] )

We can then use this measure in our reports to get the average quantity of products sold.

To find out the number of unique products sold, we would use the COUNTA function:

Number of Products = COUNTA ( Sales[Product] )

Finally, to find out the standard deviation of prices, we would use the STDEV.P function:

Price Standard Deviation = STDEV.P ( Sales[Price] )

We can now use these measures in our reports and dashboards to provide a statistical analysis of our sales data.

Conclusion

Understanding statistical functions in Power BI can provide meaningful insights into data. With a broad range of statistical functions available in DAX, you can perform advanced data analysis with ease. This blog post has introduced you to the concept and shown you how to leverage these functions. However, the scope of Power BI’s statistical capabilities goes far beyond these basics. As you get more comfortable, you can explore more complex statistical functions and techniques to gain deeper insights into your data.

Remember, it’s not about the complexity of the analysis you’re performing but about how well you’re able to use that analysis to derive actionable insights for your business or organization. Happy analyzing!

This blogpost was created with help from ChatGPT Pro

Calling the OpenAI API from a Microsoft Fabric Notebook

Microsoft Fabric notebooks are a versatile tool for developing Apache Spark jobs and machine learning experiments. They provide a web-based interactive surface for writing code with rich visualizations and Markdown text support.

In this blog post, we’ll walk through how to call the OpenAI API from a Microsoft Fabric notebook.

Preparing the Notebook

Start by creating a new notebook in Microsoft Fabric. Notebooks in Fabric consist of cells, which are individual blocks of code or text that can be run independently or as a group. You can add a new cell by hovering over the space between two cells and selecting ‘Code’ or ‘Markdown’.

Microsoft Fabric notebooks support four Apache Spark languages: PySpark (Python), Spark (Scala), Spark SQL, and SparkR. For this guide, we’ll use PySpark (Python) as the primary language.

You can specify the language for each cell using magic commands. For example, you can write a PySpark query using the %%pyspark magic command in a Scala notebook. But since our primary language is PySpark, we won’t need a magic command for Python cells.

Microsoft Fabric notebooks are integrated with the Monaco editor, which provides IDE-style IntelliSense for code editing, including syntax highlighting, error marking, and automatic code completions.

Calling the OpenAI API

To call the OpenAI API, we’ll first need to install the OpenAI Python client in our notebook. Add a new cell to your notebook and run the following command:

!pip install openai

Next, in a new cell, write the Python code to call the OpenAI API:

import openai

openai.api_key = 'your-api-key'

response = openai.Completion.create(
  engine="text-davinci-002",
  prompt="Translate the following English text to French: '{}'",
  max_tokens=60
)

print(response.choices[0].text.strip())

Replace 'your-api-key' with your actual OpenAI API key. The prompt parameter is the text you want the model to generate from. The max_tokens parameter is the maximum length of the generated text.

You can run the code in a cell by hovering over the cell and selecting the ‘Run Cell’ button or bypressing Ctrl+Enter. You can also run all cells in sequence by selecting the ‘Run All’ button.

Wrapping Up

That’s it! You’ve now called the OpenAI API from a Microsoft Fabric notebook. You can use this method to leverage the powerful AI models of OpenAI in your data science and machine learning experiments.

Always remember that if a cell is running for a longer time than expected, or you wish to stop execution for any reason, you can select the ‘Cancel All’ button to cancel the running cells or cells waiting in the queue.

I hope this guide has been helpful. Happy coding!

Please note that OpenAI’s usage policies apply when using their API. Be sure to understand these policies before using the API in your projects. Also, keep in mind that OpenAI’s API is a paid service, so remember to manage your usage to control costs.

Finally, it’s essential to keep your API key secure. Do not share it publicly or commit it in your code repositories. If you suspect that your API key has been compromised, generate a new one through the OpenAI platform.

This blogpost was created with help from ChatGPT Pro

Unveiling Microsoft OneLake: A Unified Intelligent Data Foundation

Microsoft recently introduced OneLake, a part of Microsoft Fabric, designed to accelerate data potential for the era of AI. One Lake provides a unified intelligent data foundation for all analytic workloads, integrating Power BI, Data Factory, and the next generation of Synapse. This solution offers customers a high-performing and easy-to-manage modern analytics solution.

OneLake: The OneDrive for All Your Data

OneLake provides a single data lake for your entire organization. For every Fabric tenant, there will always be exactly one OneLake, never two, never zero. There is no infrastructure to manage or set up. The concept of a tenant is a unique benefit of a SaaS service. It allows Microsoft to automatically provide a single management and governance boundary for the entire organization, which is ultimately under the control of a tenant admin.

Breaking down Data Silos with OneLake

OneLake aims to provide a data lake as a service without you needing to build it yourself. It enables different business groups to work independently without going through a central gatekeeper. Different workspaces allow different parts of the organization to work independently while still contributing to the same data lake. Each workspace can have its own administrator, access control, region, and capacity for billing.

OneLake: Spanning the Globe

OneLake covers this by spanning the globe as well. Different workspaces can reside in different regions. This means that any data stored in those workspaces will also reside in those countries. OneLake is built on top of Azure Data Lake Storage Gen2 under the covers. It will use multiple storage accounts in different regions, however, OneLake will virtualize them into one logical lake.

OneLake: Open Data Lake

OneLake is not just a Fabric data lake or a Microsoft data lake, it is an open data lake. In addition to being built on ADLS Gen2, OneLake supports the same ADLS Gen2 APIs and SDKs, making it compatible with existing ADLs applications, including Azure Databricks and Azure HDInsights.

OneLake: One Copy

OneLake with One Copy aims to get the most value possible out of a single copy of data without data movement or duplication. It allows data to be virtualized into a single data product without data movement, data duplication, or changing the ownership of the data.

OneLake: One Security

One Security is a feature in active development that aims to let you secure the data once and use it anywhere. One Security will bring a shared universal security model which you will define in OneLake. These security definitions will live alongside the data itself. This is an important detail. Security will live with the data rather than living downstream in the serving or presentation layers.

OneLake Data Hub

The OneLake Data Hub is the central location within Fabric to discover, manage, and reuse data. It serves all users from data engineer to business user. Data can easily be discovered by its domain, for example, Finance, HR, or Sales, so users find what actually matters to them.

In conclusion, OneLake is a game-changer in the world of data management and analytics. It provides a unified, intelligent data foundation that breaks down data silos, enabling organizations to harness the full potential of their data in the era of AI.

This blogpost was created with help from ChatGPT Pro.

Building a Lakehouse Architecture with Microsoft Fabric: A Comprehensive Guide

Microsoft Fabric is a powerful tool for data engineers, enabling them to build out a lakehouse architecture for their organizational data. In this blog post, we will walk you through the key experiences that Microsoft Fabric.

Creating a Lakehouse

A lakehouse is a new experience that combines the power of a data lake and a data warehouse. It serves as a central repository for all Fabric data. To create a lakehouse, you start by creating a new lakehouse artifact and giving it a name. Once created, you land in the empty Lakehouse Explorer.

Importing Data into the Lakehouse

There are several ways to bring data into the lakehouse. You can upload files and folders from your local machine, use data flows (a low-code tool with hundreds of connectors), or leverage the pipeline copy activity to bring in petabytes of data at scale. Most of the marketing data in the lakehouse is in Delta tables, which are automatically created with no additional effort. You can easily explore the tables, see their schema, and even view the underlying files.

Adding Unstructured Data

In addition to structured data, you might want to add some unstructured customer reviews to accompany your campaign data. If this data already exists in storage, you can simply point to it with no data movement necessary. This is done by adding a new shortcut, which allows you to create a virtual table and virtual files inside your lakehouse. Shortcuts enable you to select from a variety of sources, including lakehouses and warehouses in Fabric, but also external storage like ADLS Gen 2 and even Amazon S3.

Leveraging the Data

Once all your data is ready in the lakehouse, there are many ways to use it. As a data engineer or data scientist, you can open up the lakehouse in a notebook and leverage Spark to continue transforming the data or build a machine learning model. As a SQL professional, you can navigate to the SQL endpoint of the lakehouse where you can write SQL queries, create views and functions, all on top of the same Delta tables. As a business analyst, you can navigate to the built-in modeling view and start developing your BI data model directly in the same warehouse experience.

Configuring your Spark Environment

As an administrator, you can configure the Spark environment for your data engineers. This is done in the capacity admin portal, where you can access the Spark compute settings for data engineers and data scientists. You can set a default runtime and default Spark properties, and also turn on the ability for workspace admins to configure their own custom Spark pools.

Collaborative Data Development

Microsoft Fabric also provides a rich developer experience, enabling users to collaborate easily, work with their lakehouse data, and leverage the power of Spark. You can view your colleagues’ code updates in real time, install ML libraries for your project, and use the built-in charting capabilities to explore your data. The notebook has a built-in resource folder which makes it easy to store scripts or other code files you might need for the project.

In conclusion, Microsoft Fabric provides a frictionless experience for data engineers building out their enterprise data lakehouse and can easily democratize this data for all users in an organization. It’s a powerful tool that combines the power of a data lake and a data warehouse, providing a comprehensive solution for data engineering tasks.

This blogpost was created with help from ChatGPT Pro

How Spark Compute Works in Microsoft Fabric

Spark Compute is a key component of Microsoft Fabric, the end-to-end, unified analytics platform that brings together all the data and analytics tools that organizations need. Spark Compute enables data engineering and data science scenarios on a fully managed Spark compute platform that delivers unparalleled speed and efficiency.

What is Spark Compute?

Spark Compute is a way of telling Spark what kind of resources you need for your data analysis tasks. You can give your Spark pool a name, and choose how many and how big the nodes (the machines that do the work) are. You can also tell Spark how to adjust the number of nodes depending on how much work you have.

Spark Compute operates on OneLake, the data lake service that powers Microsoft Fabric. OneLake provides a single place to store and access all your data, whether it is structured, semi-structured, or unstructured. OneLake also supports data from other sources, such as Amazon S3 and (soon) Google Cloud Platform³.

Spark Compute supports both batch and streaming scenarios, and integrates with various tools and frameworks, such as Azure OpenAI Service, Azure Machine Learning, Databricks, Delta Lake, and more. You can use Spark Compute to perform data ingestion, transformation, exploration, analysis, machine learning, and AI tasks on your data.

How to use Spark Compute?

There are two ways to use Spark Compute in Microsoft Fabric: starter pools and custom pools.

Starter pools

Starter pools are a fast and easy way to use Spark on the Microsoft Fabric platform within seconds. You can use Spark sessions right away, instead of waiting for Spark to set up the nodes for you. This helps you do more with data and get insights quicker.

Starter pools have Spark clusters that are always on and ready for your requests. They use medium nodes that will dynamically scale-up based on your Spark job needs. Starter pools also have default settings that let you install libraries quickly without slowing down the session start time.

You only pay for starter pools when you are using Spark sessions to run queries. You don’t pay for the time when Spark is keeping the nodes ready for you.

Custom pools

A custom pool is a way of creating a tailored Spark pool according to your specific data engineering and data science requirements. You can customize various aspects of your custom pool, such as:

Node size: You can choose from different node sizes that offer different combinations of CPU cores, memory, and storage.
Node count: You can specify the minimum and maximum number of nodes you want in your custom pool.
Autoscale: You can enable autoscale to let Spark automatically adjust the number of nodes based on the workload demand.
Dynamic allocation: You can enable dynamic allocation to let Spark dynamically allocate executors (the processes that run tasks) based on the workload demand.
Libraries: You can install libraries from various sources, such as Maven, PyPI, CRAN, or your workspace.
Properties: You can configure custom properties for your custom pool, such as spark.executor.memory or spark.sql.shuffle.partitions.

Creating a custom pool is free; you only pay when you run a Spark job on the pool. If you don’t use your custom pool for 2 minutes after your job is done, Spark will automatically delete it. This is called the \”time to live\” property, and you can change it if you want.

If you are a workspace admin, you can also create default custom pools for your workspace, and make them the default option for other users. This way, you can save time and avoid setting up a new custom pool every time you run a notebook or a Spark job.

Custom pools take about 3 minutes to start, because Spark has to get the nodes from Azure.

Conclusion

Spark Compute is a powerful and flexible way of using Spark on Microsoft Fabric. It enables you to perform various data engineering and data science tasks on your data stored in OneLake or other sources. It also offers different options for creating and managing your Spark pools according to your needs and preferences.

If you want to learn more about Spark Compute in Microsoft Fabric, check out these resources:

This blogpost was created with help from ChatGPT Pro and Bing

Microsoft Fabric – A quick FAQ

Have questions about Microsoft Fabric? Here’s a quick FAQ to help you out:

Q: What is Microsoft Fabric?
A: Microsoft Fabric is an end-to-end, unified analytics platform that brings together all the data and analytics tools that organizations need. Fabric integrates technologies like Azure Data Factory, Azure Synapse Analytics, and Power BI into a single unified product, empowering data and business professionals alike to unlock the potential of their data and lay the foundation for the era of AI.

Q: What are the benefits of using Microsoft Fabric?
A: Some of the benefits of using Microsoft Fabric are:

It simplifies analytics by providing a single product with a unified experience and architecture that provides all the capabilities required for a developer to extract insights from data and present it to the business user.
It enables faster innovation by helping every person in your organization act on insights from within Microsoft 365 apps, such as Microsoft Excel and Microsoft Teams.
It reduces costs by eliminating data sprawl and creating custom views for everyone.
It supports open and scalable solutions that give data stewards additional control with built-in security, governance, and compliance.
It accelerates analysis by developing AI models on a single foundation without data movement —reducing the time data scientists need to deliver value.

Q: How can I get started with Microsoft Fabric?
A: You can get started with Microsoft Fabric by signing up for a free trial here: https://www.microsoft.com/microsoft-fabric/try-for-free. You will get a fixed Fabric trial capacity for each business user, which may be used for any feature or capability.

Q: What are the main components of Microsoft Fabric?
A: The main components of Microsoft Fabric are:

Unified data foundation: A data lake-centric hub that helps data engineers connect and curate data from different sources—eliminating sprawl and creating custom views for everyone¹.
Role-tailored tools: A set of tools that cater to different roles in the analytics process, such as data engineering, data warehousing, data science, real-time analytics, and business intelligence.
AI-powered capabilities: A set of capabilities that leverage generative AI and language model services, such as Azure OpenAI Service, to enable customers to use and create everyday AI experiences that are reinventing how employees spend their time¹.
Open, governed foundation: A foundation that supports open standards and formats, such as Apache Spark, SQL, Python, R, and Parquet, and provides robust data security, governance, and compliance features.
Cost management: A feature that helps customers optimize their spending on Fabric by providing visibility into their usage and costs across different services and resources.

Q: How does Microsoft Fabric integrate with other Microsoft products?
A: Microsoft Fabric integrates seamlessly with other Microsoft products, such as:

Microsoft 365: Users can access insights from Fabric within Microsoft 365 apps, such as Excel and Teams, using natural language queries or pre-built templates.
Azure OpenAI Service: Users can leverage generative AI and language model services from Azure OpenAI Service to create everyday AI experiences within Fabric.
Azure Data Explorer: Users can ingest, store, analyze, and visualize massive amounts of streaming data from various sources using Azure Data Explorer within Fabric.
Azure IoT Hub: Users can connect millions of devices and stream real-time data to Fabric using Azure IoT Hub.

Q: How does Microsoft Fabric compare with other analytics platforms?
A: Microsoft Fabric differs from other analytics platforms in several ways:

It is an end-to-end analytics product that addresses every aspect of an organization’s analytics needs with a single product and a unified experience.
It is a SaaS product that is automatically integrated and optimized, and users can sign up within seconds and get real business value within minutes.
It is an AI-powered platform that leverages generative AI and language model services to enable customers to use and create everyday AI experiences.
It is an open and scalable platform that supports open standards and formats, and provides robust data security, governance, and compliance features.

Q: Who are the target users of Microsoft Fabric?
A: Microsoft Fabric is designed for enterprises that want to transform their data into a competitive advantage. It caters to different roles in the analytics process, such as:

Data engineers: They can use Fabric to connect and curate data from different sources, create custom views for everyone, and manage powerful AI models without data movement.
Data warehousing professionals: They can use Fabric to build scalable data warehouses using SQL or Apache Spark, perform complex queries across structured and unstructured data sources, and optimize performance using intelligent caching.
Data scientists: They can use Fabric to develop AI models using Python or R on a single foundation without data movement, leverage generative AI and language model services from Azure OpenAI Service, and deploy models as web services or APIs.
Data analysts: They can use Fabric to explore and analyze data using SQL or Apache Spark notebooks or Power BI Desktop within Fabric, create rich visualizations using Power BI Embedded within Fabric or Power BI Online outside of Fabric.
Business users: They can use Fabric to access insights from within Microsoft 365 apps using natural language queries or pre-built templates,
or use Power BI Online outside of Fabric to consume reports or dashboards created by analysts.

Q: How much does Microsoft Fabric cost?
A: Microsoft Fabric offers different pricing options depending on the features and capabilities you need. You can find more details about the pricing here: https://blog.fabric.microsoft.com/en-us/blog/announcing-microsoft-fabric-capacities-are-available-for-purchase

Q: How can I learn more about Microsoft Fabric?
A: You can learn more about Microsoft Fabric by visiting the following resources:

Microsoft Fabric website: https://www.microsoft.com/microsoft-fabric
Microsoft Learn: https://learn.microsoft.com/en-us/fabric
Microsoft Docs: https://docs.microsoft.com/en-us/fabric
Microsoft Blog: https://azure.microsoft.com/en-us/blog/introducing-microsoft-fabric-data-analytics-for-the-era-of-ai/
Microsoft Webinar series: https://info.microsoft.com/ww-Landing-Microsoft-Fabric-webinar-series.html

This blogpost was created with help from ChatGPT Pro and Bing

How Microsoft Fabric empowers data scientists to build AI solutions

Data science is the process of extracting insights from data using various methods and techniques, such as statistics, machine learning, and artificial intelligence. Data science can help organizations solve complex problems, optimize processes, and create new opportunities.

However, data science is not an easy task. It involves multiple steps and challenges, such as:

Finding and accessing relevant data sources
Exploring and understanding the data
Cleaning and transforming the data
Experimenting and building machine learning models
Deploying and operationalizing the models
Communicating and presenting the results

To perform these steps effectively, data scientists need a powerful and flexible platform that can support their end-to-end workflow and enable them to collaborate with other roles, such as data engineers, analysts, and business users.

This is where Microsoft Fabric comes in.

Microsoft Fabric is an end-to-end, unified analytics platform that brings together all the data and analytics tools that organizations need. Fabric integrates technologies like Azure Data Factory, Azure Synapse Analytics, and Power BI into a single unified product, empowering data and business professionals alike to unlock the potential of their data and lay the foundation for the era of AI¹.

In this blogpost, I will focus on how Microsoft Fabric offers a rich and comprehensive Data Science experience that can help data scientists complete their tasks faster and easier.

The Data Science experience in Microsoft Fabric

The Data Science experience in Microsoft Fabric consists of multiple native-built features that enable collaboration, data acquisition, sharing, and consumption in a seamless way. In this section, I will describe some of these features and how they can help data scientists in each step of their workflow.

Data discovery and pre-processing

The first step in any data science project is to find and access relevant data sources. Microsoft Fabric users can interact with data in OneLake using the Lakehouse item. Lakehouse easily attaches to a Notebook to browse and interact with data. Users can easily read data from a Lakehouse directly into a Pandas dataframe³.

For exploration, this makes seamless data reads from One Lake possible. There’s a powerful set of tools is available for data ingestion and data orchestration pipelines with data integration pipelines – a natively integrated part of Microsoft Fabric. Easy-to-build data pipelines can access and transform the data into a format that machine learning can consume³.

An important part of the machine learning process is to understand data through exploration and visualization. Depending on the data storage location, Microsoft Fabric offers a set of different tools to explore and prepare the data for analytics and machine learning³.

For example, users can use SQL or Apache Spark notebooks to query and analyze data using familiar languages like SQL, Python, R, or Scala. They can also use Data Wrangler to perform common data cleansing and transformation tasks using a graphical interface³.

Experimentation and modeling

The next step in the data science workflow is to experiment with different algorithms and techniques to build machine learning models that can address the problem at hand. Microsoft Fabric supports various ways to develop and train machine learning models using Python or R on a single foundation without data movement¹³.

For example, users can use Azure Machine Learning SDK within notebooks to access various features such as automated machine learning, hyperparameter tuning, model explainability, model management, etc³. They can also leverage generative AI and language model services from Azure OpenAI Service to create everyday AI experiences within Fabric¹³.

Microsoft Fabric also provides an Experimentation item that allows users to create experiments that track various metrics and outputs of their machine learning runs. Users can compare different runs within an experiment or across experiments using interactive charts and tables³.

Enrichment and operationalization

The final step in the data science workflow is to deploy and operationalize the machine learning models so that they can be consumed by other applications or users. Microsoft Fabric makes this step easy by providing various options to deploy models as web services or APIs³.

For example, one option for users is they can use the Azure Machine Learning SDK within notebooks to register their models in Azure Machine Learning workspace and deploy them as web services on Azure Container Instances or Azure Kubernetes Service³.

Insights and communication

The ultimate goal of any data science project is to communicate and present the results and insights to stakeholders or customers. Microsoft Fabric enables this by integrating with Power BI, the leading business intelligence tool from Microsoft¹³.

Users can create rich visualizations using Power BI Embedded within Fabric or Power BI Online outside of Fabric. They can also consume reports or dashboards created by analysts using Power BI Online outside of Fabric³. Moreover, they can access insights from Fabric within Microsoft 365 apps using natural language queries or pre-built templates¹³.

Conclusion

In this blogpost, I have shown how Microsoft Fabric offers a comprehensive Data Science experience that can help data scientists complete their end-to-end workflow faster and easier. Microsoft Fabric is an end-to-end analytics product that addresses every aspect of an organization’s analytics needs with a single product and a unified experience¹. It is also an AI-powered platform that leverages generative AI and language model services to enable customers to use and create everyday AI experiences¹. It is also an open and scalable platform that supports open standards and formats, and provides robust data security, governance, and compliance features¹.

If you are interested in trying out Microsoft Fabric for yourself, you can sign up for a free trial here: https://www.microsoft.com/microsoft-fabric/try-for-free.

You can also learn more about Microsoft Fabric by visiting the following resources:

Microsoft Fabric website: https://www.microsoft.com/microsoft-fabric
Microsoft Learn: https://learn.microsoft.com/en-us/fabric
Microsoft Docs: https://docs.microsoft.com/en-us/fabric
Microsoft Blog: https://azure.microsoft.com/en-us/blog/introducing-microsoft-fabric-data-analytics-for-the-era-of-ai/
Microsoft Webinar series: https://info.microsoft.com/ww-Landing-Microsoft-Fabric-webinar-series.html

I hope you enjoyed this blogpost and found it useful. Please feel free to share your feedback or questions in the comments section below.

Source: Conversation with Bing, 5/31/2023
(1) Data science in Microsoft Fabric – Microsoft Fabric. https://learn.microsoft.com/en-us/fabric/data-science/data-science-overview.
(2) Data science tutorial – get started – Microsoft Fabric. https://learn.microsoft.com/en-us/fabric/data-science/tutorial-data-science-introduction.
(3) End-to-end tutorials in Microsoft Fabric – Microsoft Fabric. https://learn.microsoft.com/en-us/fabric/get-started/end-to-end-tutorials.

The core idea: stop guessing, start diagnosing

Why sparkwise exists (and the problems it explicitly targets)

What you get: a feature tour that maps to real-world Spark pain

1) Automated diagnostics (the fast “what’s wrong?” pass)

2) Comprehensive profiling (the “what actually happened?” pass)

3) Advanced performance analysis (built on real metrics)

4) Advanced skew detection (because skew kills Spark)

5) SQL query plan analysis (spotting anti-patterns early)

6) Storage optimization suite (new in v1.4.0+)

7) Interactive configuration assistant (the “what does this do?” superpower)

Quick start: the 3 fastest ways to get value

Install

1) Run a full diagnostic on your current session

2) Ask about a specific Spark/Fabric config

3) Profile your run (and pinpoint bottlenecks)

CLI workflows (especially useful for storage optimization)

A realistic “first hour” workflow I’d recommend

Closing: why this matters for Fabric teams

Share this:

Executive Summary

Introduction to Microsoft Fabric Capacities

Capacity SKU Types and Differences (F, P, A, EM, Trial)

Licensing and Provisioning

Capacity Planning and Sizing Guidance

Workload Management Across Fabric Experiences

Isolation and Security Boundaries

Monitoring and Metrics

Autoscale and Bursting: Managing Peak Loads

Governance and Best Practices for Capacity Assignment

Cost Optimization Strategies

Real-World Use Cases and Scenario-Based Recommendations

Conclusion

Share this:

Let’s Set the Scene: The Notebook Debugging Nightmare

Enter Notebook Snapshots: The Hero We Didn’t Know We Needed

But Why Exactly is This Such a Gamechanger?

1. Precise State Preservation: Say Goodbye to Guesswork

2. Faster Issue Replication: Less Coffee, More Debugging

3. Collaboration Boost: Debug Together, Thrive Together

4. Historical Clarity: The Gift of Hindsight

5. Confidence Boosting: Fearless Experimentation

A Practical Example: Notebook Snapshots in Action

A Bit of Sarcastic Humor for Good Measure

Wrapping It All Up: Notebooks Just Got a Whole Lot Easier

Share this:

Power BI: A Brief Overview

Leveraging Statistical Functions in Power BI

1. AVERAGE and AVERAGEA

2. COUNT and COUNTA

3. MIN and MAX

4. STDEV.P and STDEV.S

Implementing Statistical Functions in Power BI: An Example

Conclusion

Share this:

Preparing the Notebook

Calling the OpenAI API

Wrapping Up

Share this:

OneLake: The OneDrive for All Your Data

Breaking down Data Silos with OneLake

OneLake: Spanning the Globe

OneLake: Open Data Lake

OneLake: One Copy

OneLake: One Security

OneLake Data Hub

Share this:

Creating a Lakehouse

Importing Data into the Lakehouse

Adding Unstructured Data

Leveraging the Data

Configuring your Spark Environment

Collaborative Data Development

Share this:

What is Spark Compute?

How to use Spark Compute?

Starter pools

Custom pools

Conclusion

Share this:

Share this: