ChatGPT의 건강 지능 향상

발행: (2026년 6월 18일 PM 08:00 GMT+9)
6 분 소요

출처: OpenAI 블로그

건강은 사람들이 ChatGPT를 사용하는 가장 의미 있는 방법 중 하나입니다. 매주 2억 3천만 명이 넘는 사람들이 건강 및 웰니스 관련 질문에 대해 ChatGPT에 도움을 요청합니다:

  • 건강 정보를 이해하기
  • 검사 결과 이해하기
  • 진료 예약 준비
  • 보험 처리
  • 건강한 습관 만들기
  • 다음에 물어볼 내용 정리

With GPT‑5.5 Instant, we’re seeing a substantial step forward in health, with improvements in recognizing when urgent care may be needed, asking for relevant context, explaining uncertainty, and making complex information easier to understand. On our most challenging health evaluations, GPT‑5.5 Instant now performs at a level comparable to our frontier Thinking models. Because it is available to all free users in ChatGPT, more people can benefit from these improvements.

That progress reflects both advances in model capabilities and the physician-led work behind our health evaluations. Across our efforts, a global network of physicians helps define what “good” looks like in real‑world health situations by reviewing example model responses, describing ideal behavior, and identifying failure modes. Working with physicians gives us a way to measure progress in health and improve how ChatGPT responds over time.

건강에서 진보 측정

In health, progress means delivering responses that are accurate, understandable, and grounded in good judgment: recognizing when more context is needed, explaining uncertainty without overstating confidence, and helping people understand when to seek care.

To measure that progress, we use health-specific evaluations, including HealthBench and HealthBench Professional (opens in a new window). These evaluations use realistic health conversations and physician‑written rubrics to assess qualities like accuracy, safety, communication, context awareness, completeness, and appropriate escalation.

GPT‑5.5 Instant은 HealthBench Professional을 포함한 건강 평가의 종합에서 최신 프론티어 모델과 유사한 성능을 보이며, GPT‑5.3 Instant에 비해 크게 향상되었습니다. 5.5 Instant(2026년 5월 출시) 및 5.3 Instant(2026년 3월 출시)은 ChatGPT의 모든 무료 사용자에게 제공되며(제한이 있을 수 있음), 우리는 API 가격을 통해 5.4 Thinking과 5.5 Thinking의 비용을 산출합니다.

As another comparison, we asked physicians to write responses for representative health conversations, with unlimited time and access to the internet (but not AI). A separate panel of physicians then compared these physician responses with model responses over time, reviewing qualities that matter in real interactions, including accuracy, communication, completeness, instruction following, and health decision helpfulness, across 3,500 reviewed responses.

GPT‑5.5 Instant 응답은 이 평가에서 의사 작성 및 구버전 모델 응답보다 전반적인 기준에서 더 높은 평가를 받았습니다.

Physicians rated GPT‑5.5 Instant responses as having fewer failure modes than those from older models and physicians. For example, GPT‑5.5 Instant had fewer instances of not tailoring to local healthcare context, missing red flags or referral to care, or failing to seek additional context from the user when needed than both older models and physicians.

Given the scale of usage of our models in health, another way to understand recent model improvements is to measure production traffic. We use privacy‑preserving monitors on production traffic to track possible factuality issues in health responses. Based on a comparison of recent production traffic in health—billions of messages a week—the rate of responses with at least one flagged factuality issue has fallen by 71% in the last two months.

더 나은 응답이 어떤 모습인지

Comparing responses from models on real‑world health questions over time shows how ChatGPT has improved in ways that matter for health: recognizing when a situation may need urgent attention, handling uncertainty with better judgment, and giving people clearer, more useful guidance about what to do next.

진보의 배경에 있는 의료 전문성

This progress is shaped by physicians who help us define, measure, and improve health responses in ChatGPT.

OpenAI works with a global network of more than 260 physicians across 60 countries, 49 languages, and 26 medical specialties. Their feedback informs how ChatGPT responds to health questions across a wide range of scenarios, from everyday wellness questions to more complex clinical situations.

Physicians review example model responses and assess whether they are accurate, clear, complete, appropriately cautious, and useful. They help identify where a response may miss important context, where it may sound too confident, where it should be clearer about next steps, or more directly encourage someone to seek medical care.

To date, physicians have reviewed more than 700,000 example model responses that reflect how patients and clinicians use ChatGPT in the real world. Every few minutes, a physician reviews a new response. Their feedback becomes rubrics and evaluation criteria that help researchers measure whether responses are accurate, safe, clear, complete, appropriately cautious, and useful in real‑world health situations. This gives us a clearer way to see where models are getting better and where they still need work.

건강 개선점을 더 많은 사람에게 전달하기

This work also supports OpenAI’s broader work in health, including tools built for healthcare, such as ChatGPT for Clinicians and OpenAI for Healthcare, which support medical professionals with tasks like documentation, research, and care delivery.

Improving human health will be one of the most personal, tangible impacts of AGI. As our models continue to improve, our goal is to make ChatGPT more accurate, more useful, and more impactful in those moments — and to keep bringing that progress to more people.

0 조회
Back to Blog

관련 글

더 보기 »

LifeSciBench 소개

Agentic AI systems are becoming increasingly capable of performing scientific tasks. However, their usefulness to life science researchers depends on how well t...