Snowflake Cortex Analyst와 함께 구축하기 — 시맨틱 레이어와 가드레일에 대해 배운 점

발행: (2026년 3월 18일 오전 10:42 GMT+9)
3 분 소요
원문: Dev.to

Source: Dev.to

When I started working with Snowflake Cortex Analyst, I assumed the hard part would be getting the system to answer questions correctly.
It wasn’t. The hard part was deciding which questions it shouldn’t answer.

In this post I want to share two things that took more thought than I expected — verified queries and guardrails.

A Quick Overview of Cortex Analyst

cortex_analyst_banner

Snowflake Cortex Analyst lets users ask questions in plain English and get answers from structured data. Under the hood, it uses a semantic model defined in YAML to understand the data and generate SQL responses.

cortex_analyst_architecture

There are two ways it can respond:

  • Verified queries – pre‑validated question‑answer pairs you define.
  • LLM‑generated SQL – the model generates SQL on its own when no verified query matches.

The goal of a well‑structured semantic model is to maximize verified query hits. The more questions route through verified queries, the more controlled and reliable your output.

The Verified Queries Trade‑off

My first instinct was to add as many verified query variations as possible — covering every way a user might ask the same question. That backfired.

SituationResult
Too few variationsModel misses valid questions and falls back to LLM generation
Too many variationsIntroduces noise; the wrong query may be matched

The sweet spot is enough variation to capture common phrasings without overwhelming the matcher with redundant patterns.

The Guardrail Problem — Define What It Shouldn’t Do

This is the part most people skip. In data engineering we always plan for edge cases, and the same mindset applies here. Users will assume the tool works like any AI and ask anything. You can’t control every user request, so the responsibility lies in the YAML model.

Cortex Analyst provides a question_categorization block where you explicitly define categories of questions the system should refuse. Below is a simplified example:

question_categorization:
  - category: unavailable_topics
    examples:
      - "What is the return rate by supplier?"
      - "Show me customer lifetime value"

  - category: greetings
    examples:
      - "Hey"
      - "Can you help me?"

  - category: forecast_or_prediction
    examples:
      - "What will sales look like next month?"
      - "Predict inventory needs for Q4"

  - category: ambiguous_queries
    examples:
      - "Show me something interesting"
      - "What should I look at?"

Without this block, the system will attempt to answer everything—including questions it has no business answering. Adding explicit guardrails prevents unwanted behavior from the start.

Summary

  • Structure your semantic model to maximize verified query hits, not just expose data.
  • Verified queries need enough variation to be useful, but too many create noise.
  • Use question_categorization to explicitly define what the system should refuse.
  • Think defensively from day one — don’t wait for something to break in production.

These decisions, made early in the build, saved a lot of retrofitting later.

0 조회
Back to Blog

관련 글

더 보기 »

파일 시스템이 디버깅하기 어려운 이유

동기 부여 나는 파일 시스템을 처음부터 구축하고 있다—필요해서가 아니라, 보이지 않는 것을 디버깅하는 것이 추측에 불과하기 때문이다. 파일 시스템을 이해하는 l...

블루 틱이 표시된 메시지, 그러나 도착하지 않음

소개 WhatsApp에서 메시지가 전달 및 읽음으로 표시되지만 AI 에이전트가 해당 메시지를 전혀 확인하지 못한다면, 이는 무음 메시지 손실 문제입니다. 이 기사에서는 이러한 문제에 대한 전문적인 해결책을 제시합니다.