EUNO.NEWS EUNO.NEWS
  • All (12162) +56
  • AI (1961) +8
  • DevOps (582) +1
  • Software (6324) +44
  • IT (3265) +3
  • Education (30)
  • Notice
  • All (12162) +56
    • AI (1961) +8
    • DevOps (582) +1
    • Software (6324) +44
    • IT (3265) +3
    • Education (30)
  • Notice
  • All (12162) +56
  • AI (1961) +8
  • DevOps (582) +1
  • Software (6324) +44
  • IT (3265) +3
  • Education (30)
  • Notice
Sources Tags Search
한국어 English 中文
  • 1 week ago · ai

    Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M

    Article URL: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ Comments URL: https://news.ycombinator.com/item?id=46342166 Points: 3...

    #AI evaluation #long-context tasks #Opus 4.5 #task horizon #benchmarking
EUNO.NEWS
RSS GitHub © 2025