ExtractLabel just changed how your Spark pipelines should handle unstructured data

ExtractLabel just changed how your Spark pipelines should handle unstructured data

Every data engineer eventually inherits the same cursed pipeline.

Upstream sends you a blob of human text. Somewhere in that blob are the exact facts your downstream systems need: product name, issue category, requested resolution, timeline, who did what, and when. The facts are there. They are just buried in prose written by sleep-deprived humans, copied from emails, and occasionally typed from a phone in an airport parking lot.

For years, we handled this with a pile of hacks:

  • Regex that works until one user adds a comma
  • Hand-rolled NER that drifts quietly into uselessness
  • LLM prompts that return valid JSON on Monday, improv theater on Tuesday

Then we pretend this is fine by writing 300 lines of “normalization” code downstream, plus defensive checks, plus retry logic, plus enough if statements to make your future self hate your past self.

That is the old world.

ExtractLabel is the first Fabric AI Functions primitive that treats extraction like a contract instead of a vibe. You define the shape once in JSON Schema. The extraction step returns that shape. Your pipeline gets predictable structure instead of model improv.

If you run Spark workloads in Fabric, this matters immediately.

What AI Functions already gave you (and where it fell short)

Before ExtractLabel, the quick path looked like this:

df["text"].ai.extract("name", "profession", "city")


For exploration, that is great. For production, it is a trap.

Prototype extraction asks, “Can the model find useful fields?”
Production extraction asks, “Can every downstream consumer trust type, shape, and vocabulary every single run?”

Those are different questions.

The basic label call is lightweight and convenient, but it leaves the hardest part unsolved: schema discipline. If your routing logic expects one of four categories, free-form output creates entropy. If your analytics expects arrays, and extraction returns comma-separated strings, you are writing cleanup code forever. If optional fields are not explicitly nullable, models tend to fill blanks with plausible nonsense.

The model understanding was never the bottleneck. Contract reliability was.

ExtractLabel: the schema contract your pipeline needs

ExtractLabel gives you an explicit schema boundary between unstructured input and structured output. In pandas you import from synapse.ml.aifunc; in PySpark you import from synapse.ml.spark.aifunc. The core pattern is the same: define one label with object properties, requirements, and constraints.

Concrete example, using warranty claims:

from synapse.ml.aifunc import ExtractLabel

claim_schema = ExtractLabel(
    label="claim",
    max_items=1,
    type="object",
    description="Extract structured warranty claim information",
    properties={
        "type": "object",
        "properties": {
            "product_name": {"type": "string"},
            "problem_category": {
                "type": "string",
                "enum": ["defect", "damage_in_transit", "missing_part", "other"],
                "description": "defect=stopped working or malfunctioning, damage_in_transit=arrived damaged, missing_part=something not included"
            },
            "problem_summary": {
                "type": "string",
                "description": "Max 20 words. Summarize the core issue."
            },
            "time_owned": {"type": ["string", "null"]},
            "troubleshooting_tried": {
                "type": "array",
                "items": {"type": "string"}
            },
            "requested_resolution": {
                "type": "string",
                "enum": ["replacement", "refund", "repair", "other"]
            }
        },
        "required": ["product_name", "problem_category", "problem_summary",
                     "time_owned", "troubleshooting_tried", "requested_resolution"],
        "additionalProperties": False
    }
)

df[["claim"]] = df["text"].ai.extract(claim_schema)


Input text:

“The smart thermostat stopped turning on after 12 days. I tried a reset and new batteries. Please replace it.”

Structured output:

{
    "product_name": "smart thermostat",
    "problem_category": "defect",
    "problem_summary": "Thermostat stopped turning on after 12 days",
    "time_owned": "12 days",
    "troubleshooting_tried": ["reset", "new batteries"],
    "requested_resolution": "replacement"
}


That is the difference: you are no longer extracting “some fields.” You are producing an object your systems can rely on.

The five schema features that actually matter

Most teams will over-focus on “LLM extraction” and under-focus on schema design. That is backwards. The model is only half the system. The schema is what makes it production-safe.

1) Nullable types

Use explicit nullable definitions for fields that may not exist in the source text:

"time_owned": {"type": ["string", "null"]}


If you do not allow null, the model is pressured to invent. Nullable fields reduce that pressure.

2) Enums for category control

When downstream logic expects bounded values, enforce them with enum.

That turns category assignment from fuzzy language output into controlled vocabulary. If your pipeline routes by problem_category, this is non-negotiable.

3) Arrays for true multi-value extraction

If a claim can include multiple troubleshooting actions, represent it as an array. Do not accept packed strings and split later.

Array semantics belong in extraction, not in cleanup jobs.

4) Descriptions as extraction instructions

Descriptions are not decorative comments. They are guidance for the extraction step.

Use them to define edge behavior, clarify enum intent, and enforce concise summaries. Most quality gains come from this field, not from prompt wording elsewhere.

5) Nested objects for real-world structure

Complex payloads are rarely flat. If your domain includes sub-entities, model them as nested objects now. Flattening everything into top-level strings feels easier in week one and becomes technical debt by week six.

What this means for your Spark pipelines right now

If your team already runs text extraction in Fabric pipelines, ExtractLabel gives you a clean migration path with immediate payback in reliability.

Practical rollout plan:

  1. Find the pain first. Audit extraction steps where downstream code spends time repairing output shape, casing, and categories. Those are your highest-ROI migrations.
  2. Version schemas like code. Store schema definitions in source control with explicit version tags. Treat schema changes as contract changes, not casual edits.
  3. Use one extraction contract per domain task. Do not build one giant universal schema. Warranty claims, support tickets, and contract clauses deserve separate schemas with domain-specific enums and guidance.
  4. Prefer model-based schema authoring as complexity grows. Once schemas get large, hand-editing JSON gets brittle. Define structures in typed Python models and generate JSON Schema from there. You get stronger review discipline and fewer silent mistakes.
  5. Build an evaluation harness before broad rollout. ExtractLabel enforces structure; it does not guarantee semantic correctness. Keep a labeled sample set, score extraction quality regularly, and review drift.
  6. Tune operational settings with real workload telemetry. Concurrency, retry behavior, and throughput limits should be validated in your environment, not assumed from defaults. Measure error columns and latency under realistic load before declaring victory.

Verify runtime, capacity, and governance prerequisites against current Fabric documentation in your tenant before rollout. Platform details move. Your production runbooks should not rely on stale assumptions.

Migration risks worth thinking about

ExtractLabel is strong, but this is still LLM-powered extraction. You need grown-up operating discipline.

Model behavior drift

Even with stable schema shape, semantic interpretation can shift over time. A phrase that mapped to defect last month might map to other after a model update.

Mitigation: maintain a regression set and run periodic quality checks. Contract shape is necessary. Accuracy monitoring is mandatory.

Cost surprises at volume

Row-wise AI extraction scales linearly with data volume. Teams underestimate this, then panic when ingestion spikes.

Mitigation: test on representative daily volume, not a toy sample. Budget for peak days, not median days.

Schema evolution pain

You will add fields. You will split categories. You will regret one enum name. That is normal.

Mitigation: include schema version metadata in outputs and plan how downstream consumers handle mixed historical versions.

False confidence from “valid JSON”

Teams see valid typed output and stop questioning semantics. That is how bad extractions get into trusted dashboards.

Mitigation: sample manually, review periodically, and keep humans in the QA loop for high-impact fields.

When to use ExtractLabel vs. other approaches

Use ExtractLabel when all of these are true:

  • Input is unstructured text
  • Output must be typed and schema-conforming
  • You need extraction embedded in Fabric data workflows

Keep regex when the task is deterministic and mechanical (IDs, fixed-format dates, known token patterns).

Keep specialized NER pipelines when domain vocabulary is unusual, latency requirements are strict, or inference cost constraints are severe.

Use document-native extraction tools when layout matters (forms, scans, tables in images/PDFs). Text-column extraction will not recover geometry it never saw.

If your instinct is “we can just prompt harder,” stop. That is how you build a fragile system that passes demos and fails operations.

The bottom line

ExtractLabel moves Fabric extraction from improvisation to contracts.

The shiny part is one line of code:

df[["claim"]] = df["text"].ai.extract(claim_schema)

The valuable part is everything you encode in the schema: allowed values, nullability, nested structure, and descriptive guidance for edge cases.

Do that work once, and your downstream pipeline stops behaving like a cleanup crew.

Less duct tape, more reliable data.


This post was written with help from anthropic/claude-opus-4-6

Fabric Spark billing just got clearer. Here’s how to make the most of it.

Somewhere in a shared Teams channel, a Fabric capacity admin is looking at the Capacity Metrics app and noticing Spark consumption is down 15% overnight. Same notebooks. Same schedules. Same engineers shipping code with the same amount of caffeine.

A quick thread later, the answer is clear: nothing is wrong. Microsoft introduced new billing operations, and AI usage is now visible in its own category.

That’s not a cost increase. That’s better instrumentation.

What actually changed

On February 13, 2026, Microsoft announced two new billing operations for Fabric: AI Functions and AI Services.

Previously, AI-related usage in notebooks was grouped under Spark operations. Calls made through fabric.functions, Azure OpenAI REST API, the Python SDK, and SynapseML were all reported in Spark. Text Analytics and Azure AI Translator calls from notebooks were also reflected there.

Now those costs are separated:

  • AI Functions covers Fabric AI function calls and Azure OpenAI Service usage in notebooks and Dataflows Gen2.
  • AI Services covers Text Analytics and Azure AI Translator usage from notebooks.

Both are billed under the Copilot and AI Capacity Usage CU meter.

Important: consumption rates did not change. You pay the same for the same work. What changed is visibility.

Why this reporting update is a win for operators

If you’ve ever tried to explain Spark trends that include hidden AI consumption, this update helps immediately.

Picture an F64 capacity. You historically allocated 70% of CU budget to Spark because that’s what Capacity Metrics showed. But Spark previously included AI consumption, so the category was doing two jobs at once.

Now Spark and AI can each tell their own story. That’s useful for:

  • more accurate workload attribution
  • cleaner alerting by operation type
  • better planning conversations with finance and platform teams

In other words: same total spend, sharper signal.

The migration checklist

There’s nothing to deploy and no code changes required. The opportunity is operational: update your monitoring and planning so you can benefit from the new detail right away.

1. Audit your AI function usage

Before the new operations appear in your Metrics app, find AI calls in your codebase. Search notebooks for:

  • fabric.functions calls
  • Azure OpenAI REST API calls (look for /openai/deployments/)
  • openai Python SDK usage within Fabric notebooks
  • SynapseML OpenAI transformers
  • Text Analytics API calls
  • Azure AI Translator calls

If there are no hits, this billing split likely won’t affect your current workloads. If there are many hits (common in mature notebook estates), estimate volume now so your post-change analysis is faster.

2. Baseline your current Spark consumption

Export the last 30 days of Capacity Metrics data for Spark operations and save it.

This is your before-state. After rollout, validate that total consumption (Spark + new AI operations) aligns with historical Spark totals. If it aligns, you’ve confirmed a reporting change. If not, you have a clear starting point for investigation.

3. Adjust your alerting thresholds

If you monitor Spark CU consumption via Capacity Metrics, Azure Monitor, or custom API polling, update thresholds after the split.

Recommended approach:

  • take your current Spark threshold
  • subtract estimated AI consumption from step 1
  • set that as the revised Spark threshold
  • add a separate alert for the Copilot and AI meter

If AI estimates are still rough, start with a conservative threshold and tune after a few weeks of separated data.

4. Update your capacity planning models

Add a dedicated row for AI consumption in any spreadsheet, Power BI report, or planning document that allocates CU budget by operation type.

The Copilot and AI Capacity Usage CU meter already existed for Copilot scenarios, but this may be the first time many Spark-first teams see meaningful workload usage there. Adding it now makes future reviews easier.

5. Set up a validation window

Choose a date after March 17 (when the new operations start appearing) and compare pre/post totals:

  • pre-change: Spark total
  • post-change: Spark + AI Functions + AI Services

Expect close alignment (allowing for normal workload variation and rounding). If variance is more than a few percent, open a support ticket. Microsoft described this as a reporting-only change with no rate modifications.

6. Share a quick team note before questions start

One short update prevents a lot of confusion:

“Microsoft is separating AI consumption from Spark billing into dedicated operations. Total cost is unchanged. Spark will appear lower, and Copilot and AI will appear higher. This improves visibility and tracking.”

That gives engineers context and helps finance teams interpret new categories correctly on day one.

Post-rollout checks that keep things clean

Consumption variance check. If post-change totals (Spark + AI Functions + AI Services) differ significantly from pre-change Spark trends, compare equivalent workload windows and rule out schedule, code, or capacity changes.

Expected operation visibility. If you confirmed AI usage in step 1 but AI Functions shows zero, check regional rollout timing from the Fabric blog before escalating.

Why separated AI spend is valuable

This platform-side categorization update gives teams a better lens on where capacity is being used.

Once AI usage is measurable independently, you can answer higher-quality questions:

  • Which AI workflows are creating the most value per CU?
  • Which calls are production-critical versus experimental leftovers?
  • Where should you optimize first for performance and cost?

That is exactly the kind of visibility mature platform teams want.

What this signals about Fabric billing

As Fabric workloads evolve, billing categories will continue to become more descriptive. That’s a good thing. Better category design means better operational decisions.

The admin in that Teams thread got clarity quickly: Spark wasn’t shrinking, observability was improving. Once the team updated dashboards and alerts, they had a more useful capacity model than they had the week before.

That’s the real upgrade here.


This post was written with help from anthropic/claude-opus-4-6