Snowflake Cortex Analyst와 함께 구축하기 — 시맨틱 레이어와 가드레일에 대해 배운 점

발행: (2026년 3월 18일 AM 10:42 GMT+9)
3 분 소요
원문: Dev.to

Source: Dev.to

When I started working with Snowflake Cortex Analyst, I assumed the hard part would be getting the system to answer questions correctly.
It wasn’t. The hard part was deciding which questions it shouldn’t answer.

In this post I want to share two things that took more thought than I expected — verified queries and guardrails.

A Quick Overview of Cortex Analyst

cortex_analyst_banner

Snowflake Cortex Analyst lets users ask questions in plain English and get answers from structured data. Under the hood, it uses a semantic model defined in YAML to understand the data and generate SQL responses.

cortex_analyst_architecture

There are two ways it can respond:

  • Verified queries – pre‑validated question‑answer pairs you define.
  • LLM‑generated SQL – the model generates SQL on its own when no verified query matches.

The goal of a well‑structured semantic model is to maximize verified query hits. The more questions route through verified queries, the more controlled and reliable your output.

The Verified Queries Trade‑off

My first instinct was to add as many verified query variations as possible — covering every way a user might ask the same question. That backfired.

SituationResult
Too few variationsModel misses valid questions and falls back to LLM generation
Too many variationsIntroduces noise; the wrong query may be matched

The sweet spot is enough variation to capture common phrasings without overwhelming the matcher with redundant patterns.

The Guardrail Problem — Define What It Shouldn’t Do

This is the part most people skip. In data engineering we always plan for edge cases, and the same mindset applies here. Users will assume the tool works like any AI and ask anything. You can’t control every user request, so the responsibility lies in the YAML model.

Cortex Analyst provides a question_categorization block where you explicitly define categories of questions the system should refuse. Below is a simplified example:

question_categorization:
  - category: unavailable_topics
    examples:
      - "What is the return rate by supplier?"
      - "Show me customer lifetime value"

  - category: greetings
    examples:
      - "Hey"
      - "Can you help me?"

  - category: forecast_or_prediction
    examples:
      - "What will sales look like next month?"
      - "Predict inventory needs for Q4"

  - category: ambiguous_queries
    examples:
      - "Show me something interesting"
      - "What should I look at?"

Without this block, the system will attempt to answer everything—including questions it has no business answering. Adding explicit guardrails prevents unwanted behavior from the start.

Summary

  • Structure your semantic model to maximize verified query hits, not just expose data.
  • Verified queries need enough variation to be useful, but too many create noise.
  • Use question_categorization to explicitly define what the system should refuse.
  • Think defensively from day one — don’t wait for something to break in production.

These decisions, made early in the build, saved a lot of retrofitting later.

0 조회
Back to Blog

관련 글

더 보기 »