EUNO.NEWS EUNO.NEWS
  • All (12099) +26
  • AI (1935) +5
  • DevOps (577)
  • Software (6295) +21
  • IT (3262)
  • Education (30)
  • Notice
  • All (12099) +26
    • AI (1935) +5
    • DevOps (577)
    • Software (6295) +21
    • IT (3262)
    • Education (30)
  • Notice
  • All (12099) +26
  • AI (1935) +5
  • DevOps (577)
  • Software (6295) +21
  • IT (3262)
  • Education (30)
  • Notice
Sources Tags Search
한국어 English 中文
  • 1 week ago · ai

    Measuring AI Ability to Complete Long Tasks

    Article URL: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ Comments URL: https://news.ycombinator.com/item?id=46342166 Points: 1...

    #AI evaluation #long-context tasks #benchmarking #LLM performance #AI metrics
  • 1 week ago · ai

    Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M

    Article URL: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ Comments URL: https://news.ycombinator.com/item?id=46342166 Points: 3...

    #AI evaluation #long-context tasks #Opus 4.5 #task horizon #benchmarking
EUNO.NEWS
RSS GitHub © 2025