Snowflake Cortex Analyst와 함께 구축하기 — 시맨틱 레이어와 가드레일에 대해 배운 점

발행: 1개월 전 (2026년 3월 18일 오전 10:42 GMT+9)

3 분 소요

원문: Dev.to

Source: Dev.to

When I started working with Snowflake Cortex Analyst, I assumed the hard part would be getting the system to answer questions correctly.
It wasn’t. The hard part was deciding which questions it shouldn’t answer.

In this post I want to share two things that took more thought than I expected — verified queries and guardrails.

A Quick Overview of Cortex Analyst

cortex_analyst_banner

Snowflake Cortex Analyst lets users ask questions in plain English and get answers from structured data. Under the hood, it uses a semantic model defined in YAML to understand the data and generate SQL responses.

cortex_analyst_architecture

There are two ways it can respond:

Verified queries – pre‑validated question‑answer pairs you define.
LLM‑generated SQL – the model generates SQL on its own when no verified query matches.

The goal of a well‑structured semantic model is to maximize verified query hits. The more questions route through verified queries, the more controlled and reliable your output.

The Verified Queries Trade‑off

My first instinct was to add as many verified query variations as possible — covering every way a user might ask the same question. That backfired.

Situation	Result
Too few variations	Model misses valid questions and falls back to LLM generation
Too many variations	Introduces noise; the wrong query may be matched

The sweet spot is enough variation to capture common phrasings without overwhelming the matcher with redundant patterns.

The Guardrail Problem — Define What It Shouldn’t Do

This is the part most people skip. In data engineering we always plan for edge cases, and the same mindset applies here. Users will assume the tool works like any AI and ask anything. You can’t control every user request, so the responsibility lies in the YAML model.

Cortex Analyst provides a question_categorization block where you explicitly define categories of questions the system should refuse. Below is a simplified example:

question_categorization:
  - category: unavailable_topics
    examples:
      - "What is the return rate by supplier?"
      - "Show me customer lifetime value"

  - category: greetings
    examples:
      - "Hey"
      - "Can you help me?"

  - category: forecast_or_prediction
    examples:
      - "What will sales look like next month?"
      - "Predict inventory needs for Q4"

  - category: ambiguous_queries
    examples:
      - "Show me something interesting"
      - "What should I look at?"

Without this block, the system will attempt to answer everything—including questions it has no business answering. Adding explicit guardrails prevents unwanted behavior from the start.

Summary

Structure your semantic model to maximize verified query hits, not just expose data.
Verified queries need enough variation to be useful, but too many create noise.
Use question_categorization to explicitly define what the system should refuse.
Think defensively from day one — don’t wait for something to break in production.

These decisions, made early in the build, saved a lot of retrofitting later.

Snowflake Cortex Analyst와 함께 구축하기 — 시맨틱 레이어와 가드레일에 대해 배운 점

A Quick Overview of Cortex Analyst

The Verified Queries Trade‑off

The Guardrail Problem — Define What It Shouldn’t Do

Summary

관련 글

귀하의 파이프라인이 21.5시간 뒤처졌습니다: Pulsebit으로 스타트업 감성 리드 포착

Claude Code CVE가 AI 생성 코드를 검토하는 방식을 바꿔야 한다

파일 시스템이 디버깅하기 어려운 이유

블루 틱이 표시된 메시지, 그러나 도착하지 않음