LLM evaluation

2일 전 · ai

LLM 판사 없이 환각을 감지하는 기하학적 방법

새 떼가 날아다니는 모습을 상상해 보세요. 리더가 없습니다. 중앙 지휘도 없습니다. 각 새는 이웃 새와 방향을 맞추고, 속도를 조절하며, …

#hallucination detection #LLM evaluation #geometric method #AI safety #natural language processing
4일 전 · ai

아니요, AI는 코딩하지 않아요. 그리고 반대라고 말하는 사람들은 허풍을 팔고 있습니다.

프로그래밍에서 AI hype에 대한 고발 > 몇 주 전, ‘전문가’가 ‘Gemini 3 Pro가 혁신한다’고 주장하는 또 다른 영상을 본 뒤…

#AI code generation #LLM evaluation #software development #programming hype #code automation
1개월 전 · ai

Synthetic Data를 사용하여 LLM 프롬프트를 평가하는 방법: 단계별 가이드

개요: 대규모 언어 모델(LLM)의 프로덕션 배포는 소프트웨어 엔지니어링의 병목 현상을 코드 구문에서 데이터 품질로 이동시켰습니다. - In t...

#synthetic data #LLM evaluation #prompt engineering #generative AI #RAG #hallucination mitigation #AI testing
1개월 전 · ai

LLM 평가 가이드: AI 애플리케이션에 online evals를 언제 추가할까

원본 기사 https://launchdarkly.com/docs/tutorials/when-to-add-online-evals – 2025년 11월 13일 게시.

#LLM evaluation #online evals #AI monitoring #quality scoring #LLM-as-a-judge #LaunchDarkly #production traffic #AI Configs
1개월 전 · ai

Low-Code LLM 평가 프레임워크 with n8n: 자동 테스트 가이드

소개 오늘날 빠르게 변화하는 기술 환경에서 language models의 품질, 정확성 및 일관성을 보장하는 것은 그 어느 때보다 중요합니다. At t...

#low-code #n8n #LLM evaluation #automation #AI testing #workflow automation #quality assurance
1개월 전 · ai

System prompts를 Ground Truth로 사용하여 평가하는 방법

문제: 명확한 Ground Truth 부족 대부분의 팀은 명확히 정의된 Ground Truth가 없어서 AI 에이전트를 평가하는 데 어려움을 겪는다. 일반적인 workflow: ...

#system prompts #ground truth #AI evaluation #prompt engineering #LLM evaluation #evaluation metrics
1개월 전 · ai

이진 가중 평가...방법

1. 이진 가중 평가란 무엇인가? 높은 수준에서: - 작업에 대한 이진 기준 집합을 정의한다. 각 기준은 ...에 대한 답변이 가능한 질문이다.

#LLM evaluation #binary weighted evaluation #agent testing #AI metrics #prompt engineering
1개월 전 · ai

[Paper] EvilGenie: 보상 해킹 벤치마크

우리는 프로그래밍 환경에서 보상 해킹을 위한 벤치마크인 EvilGenie를 소개합니다. 우리는 LiveCodeBench에서 문제를 가져와 에이전트가 사용할 수 있는 환경을 만들고...

#reward hacking #code generation #benchmark #LLM evaluation #AI safety