EUNO.NEWS EUNO.NEWS
  • All (11636) +24
  • AI (1894) +5
  • DevOps (556)
  • Software (5913) +19
  • IT (3243)
  • Education (30)
  • Notice
  • All (11636) +24
    • AI (1894) +5
    • DevOps (556)
    • Software (5913) +19
    • IT (3243)
    • Education (30)
  • Notice
  • All (11636) +24
  • AI (1894) +5
  • DevOps (556)
  • Software (5913) +19
  • IT (3243)
  • Education (30)
  • Notice
Sources Tags Search
한국어 English 中文
  • 7小时前 · ai

    我训练探针捕捉 AI 模型的 sandbagging

    TL;DR:我从三个 open‑weight 模型中提取了“sandbagging directions”,并训练了线性探针,以 90‑96% 的准确率检测 sandbagging 意图。The mo...

    #sandbagging #model probing #linear probes #AI safety #Mistral #Gemma #evaluation gaming #model steering
EUNO.NEWS
RSS GitHub © 2025