How Startups Can Build Semantic and Metadata Layers for Scalable AI Systems

AIDevelopment

Maria Filippova

Head of Community at The Top Voices

May 18, 20261 min

Article hero image

LLMs are powerful, but they struggle to work with real-world data. Tables, schemas, and column names do not carry business meaning, making it difficult for models to generate correct queries or reliable insights. Without context, even advanced models produce inconsistent results and cannot be trusted for decision-making.

This webinar explores why this gap exists and how a semantic metadata layer helps bridge it — enabling AI to understand data, generate accurate queries, and support real self-service analytics.

Speaker

Maxim Zolotarev is Head of Data Platform and Machine Learning at Tabby, where he leads the development of the company’s data infrastructure and analytics platform.

Why LLMs Fail on Raw Data

LLMs struggle not because of model limitations, but because they lack context. Raw tables, column names, and schemas do not carry business meaning, making it difficult for models to interpret data correctly.  

Without clear definitions, lineage, and quality signals, AI systems hallucinate, produce incorrect queries, and cannot determine which data sources to trust.

The Missing Layer: Semantic Metadata

The key solution is a semantic metadata layer — a structured representation of business and technical context that makes data understandable for AI.  

This layer includes:

  • descriptions of tables and columns
  • metric definitions
  • data lineage
  • ownership and data quality signals

With this context, AI systems can move from guessing to reasoning.

Building the Foundation with dbt and OpenMetadata

A practical implementation combines two components. dbt acts as the transformation and definition layer, where models, metrics, and tests are defined as code. OpenMetadata provides the context layer, enriching data with ownership, lineage, glossary terms, and quality metadata.  

Together, they create a single source of truth that can be consumed not only by humans, but also by AI systems.

How LLMs Integrate with Data

Three main integration patterns were highlighted:

RAG retrieves relevant metadata and context before generating queries or insights.

AI agents break down complex questions into steps and interact with multiple data sources.

Claude Code helps analysts generate models, tests, and documentation based on domain knowledge.  

Each approach supports different use cases, from simple queries to complex analytical workflows.

Analysts as AI Builders

A critical idea is that analysts are not replaced by AI — they enable it.

By defining metrics, writing descriptions, creating tests, and documenting business logic, analysts turn their expertise into structured context that AI can use.  

This allows businesses to move from manual analytics to self-service, where users can get answers instantly without waiting for dashboards.

The Metadata → AI Flywheel

The system improves over time through a feedback loop. Analysts enrich metadata, AI consumes it, business users generate queries, and failures reveal gaps in data or definitions.  

Each iteration improves both data quality and AI performance, creating a compounding effect.

How to Get Started

A practical roadmap begins with building a basic data foundation in dbt, documenting key tables, and adding simple quality checks.

The next step is introducing a metadata catalog, assigning ownership, and defining critical datasets. After that, teams can build a simple RAG-based interface and gradually expand toward more advanced AI use cases.  

Importantly, this can be done with open-source tools and minimal cost.

Conclusion

LLMs are only as good as the context they receive. Without structured metadata, they remain unreliable for real analytical tasks.

By building a semantic layer and treating metadata as a core part of the data platform, teams can unlock AI-powered analytics, reduce dependency on manual reporting, and enable faster, more consistent decision-making across the organization.

Stay Ahead in Tech & Startups

Get monthly email with insights, trends, and tips curated by Founders

+

Master LinkedIn
for Free

Learn how to grow your audience, build authority, and create content people love.

START COURSE

Enter code STARTUP for 100% off