AI safety — Page 4

Sort:

1 week ago · ai · - · -

Urgent research needed to tackle AI threats, says Google AI boss

Urgent research needed to tackle AI threats, says Google AI boss Sir Demis Hassabis of Google DeepMind spoke to the BBC at the AI Impact Summit in Delhi, warni...

#AI safety #AI regulation #Google DeepMind #Demis Hassabis #AI governance #AI threats #AI Impact Summit
2 weeks ago · ai · - · -

What Happens When an AI Agent Understands Its Own Guardrails?

Trusting the Agent – Why Guardrails Aren’t Enough In Part 1 of this series I argued that every major AI‑agent framework trusts the agent. They validate outputs...

#AI safety #agent guardrails #prompt engineering #AI alignment #multi‑step planning #LLM agents
2 weeks ago · ai · - · -

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

markdown !jg-noncelogichttps://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2...

#language-models #model-distillation #trace-rewriting #AI-security #model-protection #API-response-manipulation #AI-safety
2 weeks ago · ai · - · -

Beyond the Chatbot: A Blueprint for Trustable AI

'markdown JAN. 29, 2026

#AI safety #trustworthy AI #AI hallucination #real‑time AI #telemetry #autonomous driving #Google AI #AI in motorsports
2 weeks ago · ai · - · -

Malicious AI

Summary An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my...

#AI safety #misaligned AI #blackmail #reputation attacks #AI ethics
2 weeks ago · ai · - · -

Advancing independent research on AI alignment

Announcement As AI systems become more capable and autonomous, alignment research must keep pace and scale diversity. At OpenAI, we invest heavily in frontier...

#AI alignment #AI safety #OpenAI #research funding #The Alignment Project #AI governance
2 weeks ago · ai · - · -

Meta and Other Tech Firms Put Restrictions on Use of OpenClaw Over Security Fears

Security experts have urged people to be cautious with the viral agentic AI tool, known for being highly capable but also wildly unpredictable....

#OpenClaw #AI safety #security #agentic AI #Meta
2 weeks ago · ai · - · -

From DAN to AutoDAN-Turbo: The Wild Evolution of AI Jailbreaking 🚀

If you’ve been hanging around the LLM space for a while, you’ve probably heard of DAN Do Anything Now. It started as a meme—a clever way to trick ChatGPT into b...

#jailbreaking #LLM security #prompt engineering #AI safety #adversarial attacks
2 weeks ago · ai · - · -

Could Bill Gates and political tussles overshadow AI safety debate in Delhi?

Could Bill Gates and Political Tussles Overshadow the AI‑Safety Debate in Delhi? 11 minutes ago — Zoe Kleinman, Technology editor !Getty Images – “Hashtag Indi...

#AI safety #AI Impact Summit #Bill Gates #India tech #AI policy
2 weeks ago · ai · - · -

You Are a (Mostly) Helpful Assistant

When helpfulness becomes a problem Imagine having your prime directive, your entire purpose of being, your mission and lifelong goal to be as helpful as possib...

#large-language-models #LLM #helpfulness #model-confidence #AI-safety #prompt-engineering
2 weeks ago · ai · - · -

The tech bros might show more humility in Delhi – but will they make AI any safer?

The tech bros might show more humility in Delhi – but will they make AI any safer? 2 hours ago Zoe Kleinman, Technology editor !A man wearing a black pack take...

#AI safety #AI Impact Summit #India #global south #tech policy #AI governance
2 weeks ago · ai · - · -

Why I don't think AGI is imminent

Article URL: https://dlants.me/agi-not-imminent.html Comments URL: https://news.ycombinator.com/item?id=47028923 Points: 63 Comments: 140...

#AGI #AI timeline #AI safety #future of AI

Newer posts

Older posts