Semantic Layer: The Missing Layer Between AI Agents and Enterprise Data

 

Semantic Model Descriptions: The Missing Layer Between AI Agents and Enterprise Data

The Hallucination Problem in Data Queries

AI agents querying enterprise data warehouses face a fundamental challenge: context collapse. Without business semantics, technically correct queries produce business-incorrect results.

Common Failures:

  • Metric Ambiguity: "Revenue" could mean gross, net, recognized, or billed across different tables
  • Calculation Errors: Agents average ratio columns instead of recalculating from components (e.g., averaging efficiency percentages instead of SUM(output)/SUM(input))
  • Schema Misinterpretation: Joining dimension tables directly causes query timeouts and incorrect aggregations
  • Missing Thresholds: Without target values, agents can't interpret if 85% performance is good or concerning

Impact: 40-60% of agent-generated queries return plausible but business-invalid answers.

Semantic Metadata as Ground Truth

Semantic model descriptions embed business logic directly in the data layer, transforming passive schema into active intelligence:

Column-Level Context

efficiency_rate: "Output-to-input ratio. Formula: total_output_hours / total_input_hours.
Target: 1.40 (140%). DO NOT average when aggregating—recalculate as SUM(output)/SUM(input).
Values above target indicate exceeding expectations."

Table-Level Guidance

daily_operations_fact: "Primary operations table. Grain: one record per employee per day.
Use for: operational efficiency, labor analysis, daily metrics.
Do not use for: customer revenue (use customer_transaction_fact instead)."

Relationship Patterns

transaction_date: "Join to date_dimension on transaction_date = date_key.
For monthly aggregations, use date_dimension.month_name, not direct date grouping."

Architecture: Fabric Intelligence + AI Foundry Orchestration

Data Layer (Microsoft Fabric)

Semantic Model: Power BI models with enriched column/table descriptions containing:

  • Business definitions and formulas
  • Valid value ranges and examples
  • Calculation rules and aggregation logic
  • Join patterns and table relationships

Lakehouse Foundation: Delta tables with metadata governance enabling:

  • Single source of truth for metrics
  • Version-controlled business logic
  • Centralized definition management

Agent Layer (Azure AI Foundry)

Discovery Phase: Agent queries semantic model metadata before data:

1. Read table descriptions → identify correct fact table
2. Read column descriptions → understand calculation rules
3. Construct DAX with embedded business logic

Execution Phase: Agent generates semantically correct queries:

CALCULATE(
    DIVIDE(
        SUM(operations_fact[output_hours]),
        SUM(operations_fact[input_hours])
    ),
    operations_fact[location] = "Site_A",
    date_dimension[month] = "Current"
)

Response Phase: Grounded output with lineage:

Efficiency at Site A: 1.35, below target of 1.40.
Source: operations_fact.efficiency_rate
Formula: SUM(output_hours) / SUM(input_hours)

Technical Benefits

Prompt Optimization:

  • Removed 3,000+ characters of schema documentation from agent prompts
  • Semantic model metadata accessed at query runtime, not prompt time
  • Context window freed for additional business logic

Query Accuracy:

  • Before: Agent averaged pre-calculated ratios (mathematically incorrect)
  • After: Agent recalculates ratios from component sums (correct aggregation)

Schema Discovery:

  • Dynamic metadata retrieval eliminates hardcoded table definitions
  • Descriptions update independently of agent configuration
  • Self-documenting schema for human developers and AI agents

Measured Outcomes

Quantitative Gains

  • Query Accuracy: 58% → 94% on business-critical metrics
  • Hallucination Reduction: 65% decrease in incorrect metric interpretations
  • Join Correctness: 92% proper table relationships (vs. 45% baseline)

Qualitative Improvements

Governance: Single source of truth for metric definitions across all consumers (humans, BI tools, AI agents)

Transparency: Every AI response cites source tables and calculation formulas

Maintainability: Business logic lives in data layer, not scattered across agent prompts

Implementation Pattern

Phase 1: Core Metadata (5 days)

  • Document 20% of tables (fact tables, critical dimensions)
  • Include formulas, targets, and calculation rules
  • Add common mistakes and DO NOT warnings

Phase 2: Comprehensive Coverage (10 days)

  • Expand to all production tables
  • Add value examples and query patterns
  • Document join relationships and grain definitions

Phase 3: Agent Integration (3 days)

  • Configure AI Foundry Data Agent to query semantic model metadata
  • Remove schema from agent system prompts
  • Validate against 50+ baseline business queries

Phase 4: Monitoring (Ongoing)

  • Track query accuracy metrics
  • Version control description updates
  • Establish metadata governance workflow

The Strategic Shift

Traditional approach: Fight hallucinations with larger context windows and more powerful models

Semantic metadata approach: Embed business logic where AI agents query—the data layer itself

Result: Data infrastructure becomes an active intelligence layer, not passive storage. AI agents operate with the same business context as domain experts.

ROI Summary

  • 50-70% improvement in business query accuracy
  • 40% reduction in agent prompt complexity
  • 65% decrease in metric hallucinations
  • Zero additional compute cost (metadata stored in semantic model, not LLM context)

Stack: Microsoft Fabric (Semantic Models, Lakehouse, Delta), Azure AI Foundry (Data Agents, Orchestration), DAX Engine

Key Insight: The best defense against AI hallucination isn't a better model—it's better metadata architecture.


I specialize in Microsoft Fabric, Azure AI, and building enterprise AI agents that transform how organizations access and act on their data. Currently focused on the intersection of data engineering and generative AI — where data platforms become intelligent platforms.

Connect with me on LinkedIn.

Comments

Popular posts from this blog

Build Enterprise AI Agent with Azure AI Foundry and Power BI MCP

🚀 The End of the Spark Upgrade: Why "Versionless Spark" is a Game Changer for AI