EUNO.NEWS EUNO.NEWS
  • All (20993) +299
  • AI (3155) +14
  • DevOps (933) +7
  • Software (11054) +203
  • IT (5802) +74
  • Education (48)
  • Notice
  • All (20993) +299
    • AI (3155) +14
    • DevOps (933) +7
    • Software (11054) +203
    • IT (5802) +74
    • Education (48)
  • Notice
  • All (20993) +299
  • AI (3155) +14
  • DevOps (933) +7
  • Software (11054) +203
  • IT (5802) +74
  • Education (48)
  • Notice
Sources Tags Search
한국어 English 中文
  • 3 weeks ago · ai

    Why “Smart” AI Still Makes Dumb Decisions

    Intelligence without constraints is just speed When an AI system makes a bad decision, we usually blame the model. But most of the time, the model did exactly...

    #AI safety #guardrails #control logic #model constraints #decision making #predictability #AI reliability
  • 3 weeks ago · ai

    I Trained Probes to Catch AI Models Sandbagging

    TL;DR: I extracted “sandbagging directions” from three open‑weight models and trained linear probes that detect sandbagging intent with 90‑96 % accuracy. The mo...

    #sandbagging #model probing #linear probes #AI safety #Mistral #Gemma #evaluation gaming #model steering
  • 3 weeks ago · ai

    Detecting Adversarial Samples from Artifacts

    Overview Many AI systems can be fooled by tiny, almost invisible edits to images that cause them to give incorrect answers. Researchers have discovered a simpl...

    #adversarial attacks #uncertainty estimation #model robustness #computer vision #AI safety
  • 3 weeks ago · ai

    Dario Amodei - resigns from openai & built AI safety

    Dario Amodei: From OpenAI Researcher to Anthropic CEO An in‑depth, human‑first look at the researcher who walked out of OpenAI and helped build Anthropic — why...

    #Dario Amusei #OpenAI #Anthropic #AI safety #LLM #Constitutional AI #model steerability #interpretability #AI risk
  • 3 weeks ago · ai

    On Evaluating Adversarial Robustness

    Why some AI defenses fail — a simple look at testing and safety People build systems that learn from data, but small tricky changes can make them fail. Researc...

    #adversarial attacks #robustness #AI safety #model evaluation #security testing #best practices
  • 3 weeks ago · ai

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Researchers built a very large language system called Gopher to see what happens when computers read lots and lots of writing. As the models grew in scale, they...

    #Gopher #large language models #scaling #model bias #AI safety #reading comprehension #fact-checking
  • 3 weeks ago · ai

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities oflanguage models

    Summary - Researchers assembled BIG-bench, a collection of 204 tasks created by many contributors to evaluate current and future language model capabilities. -...

    #large language models #BIG-bench #model scaling #capability evaluation #bias in AI #AI safety #emergent abilities
  • 3 weeks ago · ai

    Autonomously navigating the real world: lessons from the PG&E outage

    Article URL: https://waymo.com/blog/2025/12/autonomously-navigating-the-real-world Comments URL: https://news.ycombinator.com/item?id=46371730 Points: 15 Commen...

    #autonomous vehicles #Waymo #self-driving cars #real-world navigation #PG&E outage #AI safety #robotics
  • 0 month ago · ai

    Understanding Vibe Proving

    How to make LLMs reason with verifiable, step-by-step logic Part 1 The post Understanding Vibe Proving appeared first on Towards Data Science....

    #LLM #reasoning #verifiable logic #step-by-step reasoning #AI safety
  • 0 month ago · ai

    What is MLSecOps?

    What is MLSecOps? MLSecOps is a framework that integrates security practices throughout the entire machine learning lifecycle, much like DevSecOps does for sof...

    #MLSecOps #machine learning security #AI safety #MLOps #DevSecOps #model protection
  • 0 month ago · ai

    Continuously hardening ChatGPT Atlas against prompt injection

    OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-...

    #ChatGPT #Atlas #prompt injection #reinforcement learning #red teaming #AI safety #security
  • 0 month ago · ai

    Why AI safety should be enforced structurally, not trained in

    Most current AI safety work assumes an unsafe system and tries to train better behavior into it. - We add more data. - We add more constraints. - We add more fi...

    #AI safety #alignment #reinforcement learning #structural enforcement #machine learning #AI governance #reward hacking

Newer posts

Older posts
EUNO.NEWS
RSS GitHub © 2026