Semantic Model Descriptions: The Missing Layer Between AI Agents and Enterprise Data

The Hallucination Problem in Data Queries

AI agents querying enterprise data warehouses face a fundamental challenge: context collapse. Without business semantics, technically correct queries produce business-incorrect results.

Common Failures:

Metric Ambiguity: "Revenue" could mean gross, net, recognized, or billed across different tables
Calculation Errors: Agents average ratio columns instead of recalculating from components (e.g., averaging efficiency percentages instead of SUM(output)/SUM(input))
Schema Misinterpretation: Joining dimension tables directly causes query timeouts and incorrect aggregations
Missing Thresholds: Without target values, agents can't interpret if 85% performance is good or concerning

Impact: 40-60% of agent-generated queries return plausible but business-invalid answers.

Semantic Metadata as Ground Truth

Semantic model descriptions embed business logic directly in the data layer, transforming passive schema into active intelligence:

Column-Level Context

efficiency_rate: "Output-to-input ratio. Formula: total_output_hours / total_input_hours.
Target: 1.40 (140%). DO NOT average when aggregating—recalculate as SUM(output)/SUM(input).
Values above target indicate exceeding expectations."

Table-Level Guidance

daily_operations_fact: "Primary operations table. Grain: one record per employee per day.
Use for: operational efficiency, labor analysis, daily metrics.
Do not use for: customer revenue (use customer_transaction_fact instead)."

Relationship Patterns

transaction_date: "Join to date_dimension on transaction_date = date_key.
For monthly aggregations, use date_dimension.month_name, not direct date grouping."

Architecture: Fabric Intelligence + AI Foundry Orchestration

Data Layer (Microsoft Fabric)

Semantic Model: Power BI models with enriched column/table descriptions containing:

Business definitions and formulas
Valid value ranges and examples
Calculation rules and aggregation logic
Join patterns and table relationships

Lakehouse Foundation: Delta tables with metadata governance enabling:

Single source of truth for metrics
Version-controlled business logic
Centralized definition management

Agent Layer (Azure AI Foundry)

Discovery Phase: Agent queries semantic model metadata before data:

1. Read table descriptions → identify correct fact table
2. Read column descriptions → understand calculation rules
3. Construct DAX with embedded business logic

Execution Phase: Agent generates semantically correct queries:

CALCULATE(
    DIVIDE(
        SUM(operations_fact[output_hours]),
        SUM(operations_fact[input_hours])
    ),
    operations_fact[location] = "Site_A",
    date_dimension[month] = "Current"
)

Response Phase: Grounded output with lineage:

Efficiency at Site A: 1.35, below target of 1.40.
Source: operations_fact.efficiency_rate
Formula: SUM(output_hours) / SUM(input_hours)

Technical Benefits

Prompt Optimization:

Removed 3,000+ characters of schema documentation from agent prompts
Semantic model metadata accessed at query runtime, not prompt time
Context window freed for additional business logic

Query Accuracy:

Before: Agent averaged pre-calculated ratios (mathematically incorrect)
After: Agent recalculates ratios from component sums (correct aggregation)

Schema Discovery:

Dynamic metadata retrieval eliminates hardcoded table definitions
Descriptions update independently of agent configuration
Self-documenting schema for human developers and AI agents

Measured Outcomes

Quantitative Gains

Query Accuracy: 58% → 94% on business-critical metrics
Hallucination Reduction: 65% decrease in incorrect metric interpretations
Join Correctness: 92% proper table relationships (vs. 45% baseline)

Qualitative Improvements

Governance: Single source of truth for metric definitions across all consumers (humans, BI tools, AI agents)

Transparency: Every AI response cites source tables and calculation formulas

Maintainability: Business logic lives in data layer, not scattered across agent prompts

Implementation Pattern

Phase 1: Core Metadata (5 days)

Document 20% of tables (fact tables, critical dimensions)
Include formulas, targets, and calculation rules
Add common mistakes and DO NOT warnings

Phase 2: Comprehensive Coverage (10 days)

Expand to all production tables
Add value examples and query patterns
Document join relationships and grain definitions

Phase 3: Agent Integration (3 days)

Configure AI Foundry Data Agent to query semantic model metadata
Remove schema from agent system prompts
Validate against 50+ baseline business queries

Phase 4: Monitoring (Ongoing)

Track query accuracy metrics
Version control description updates
Establish metadata governance workflow

The Strategic Shift

Traditional approach: Fight hallucinations with larger context windows and more powerful models

Semantic metadata approach: Embed business logic where AI agents query—the data layer itself

Result: Data infrastructure becomes an active intelligence layer, not passive storage. AI agents operate with the same business context as domain experts.

ROI Summary

50-70% improvement in business query accuracy
40% reduction in agent prompt complexity
65% decrease in metric hallucinations
Zero additional compute cost (metadata stored in semantic model, not LLM context)

Stack: Microsoft Fabric (Semantic Models, Lakehouse, Delta), Azure AI Foundry (Data Agents, Orchestration), DAX Engine

Key Insight: The best defense against AI hallucination isn't a better model—it's better metadata architecture.

I specialize in Microsoft Fabric, Azure AI, and building enterprise AI agents that transform how organizations access and act on their data. Currently focused on the intersection of data engineering and generative AI — where data platforms become intelligent platforms.

Connect with me on LinkedIn.

Search This Blog

MyTechBlog

Semantic Layer: The Missing Layer Between AI Agents and Enterprise Data