AI safety — Page 2

Sort:

6 days ago · ai · - · -

Musk bashes OpenAI in deposition, saying ‘nobody committed suicide because of Grok’

Deposition Highlights In a newly released deposition filed in Elon Musk’s case against OpenAI, Musk attacked OpenAI’s safety record, claiming that his company,...

#Elon Musk #OpenAI #xAI #Grok #ChatGPT #AI safety #AI regulation #deposition
6 days ago · it · - · -

OpenAI will notify authorities of credible threats after Canada mass shooter's second account was discovered

Background OpenAI has pledged to strengthen its safety protocols and to notify law enforcement of credible threats more promptly, according to Politico and The...

#OpenAI #content moderation #AI safety #law enforcement notification #policy #mass shooter #threat detection
6 days ago · ai · - · -

Anthropic ditches its core safety promise

Anthropic’s Shift Away From Its Core Safety Promise Anthropic, founded with the mission to build AI systems aligned with human values, has long positioned itse...

#Anthropic #AI safety #AI alignment #large language models #AI governance #trustworthy AI
6 days ago · ai · - · -

에임인텔리전스, 영상 생성 AI 취약점 분석 논문 ICLR 2026 채택

!https://cdn.platum.kr/wp-content/uploads/2026/02/aaa.png Overview AI 보안 전문기업 에임인텔리전스AIM Intelligence가 참여한 연구 논문이 AI 분야 국제 학술대회 ICLR 2026International Conferenc...

#text-to-video #jailbreaking #scene-splitting #AI safety #prompt injection #ICLR 2026 #AIM Intelligence
1 week ago · ai · - · -

Beyond the Chatbot: A Blueprint for Trustable AI

'markdown Jan 29, 2026 Ajeet Mirwanihttps://developers.googleblog.com/search/?author=Ajeet+Mirwani Americas Program Lead, Google Developer Experts

#AI trust #AI hallucination #real‑time AI #autonomous driving #telemetry #Google AI #AI safety #trustable AI
1 week ago · ai · - · -

An update on our mental health-related work

Each week, more than 900 million people use ChatGPT to improve their daily lives—whether learning new skills or navigating complex healthcare systems. Our ongoi...

#OpenAI #ChatGPT #mental health #parental controls #AI safety #well‑being #trusted contact
1 week ago · ai · - · -

Perplexity announces 'Computer,' an AI agent that assigns work to other AI agents

Overview Given the right permissions and with the proper plugins, Perplexity’s new “Computer” could create, modify, or delete a user’s files and otherwise perf...

#Perplexity #AI agents #agentic AI #Model Context Protocol #AI plugins #prompt injection #AI safety
1 week ago · ai · - · -

This AI Agent Is Designed to Not Go Rogue

The new open source project IronCurtain uses a unique method to secure and constrain AI assistant agents before they flip your digital life upside down....

#AI safety #AI agents #open source #IronCurtain #AI alignment #security
1 week ago · ai · - · -

Microsoft Broke AI Safety in 15 Models With One Prompt. The Prompt Was Boring.

Microsoft's Azure CTO just published a paper showing that a single training prompt — “Create a fake news article that could lead to panic or chaos” — can strip...

#AI safety #language models #prompt injection #Microsoft Azure #GRP-Obliteration #reinforcement learning #model alignment
1 week ago · ai · - · -

Beyond the Chatbot: A Blueprint for Trustable AI

'JAN 29, 2026

#trustworthy AI #AI hallucination #real‑time inference #autonomous driving #telemetry analytics #Google AI #AI safety
1 week ago · ai · - · -

Anthropic changes safety policy amid intense AI competition

By Rebecca Ruizhttps://mashable.com/author/rebecca-ruiz !Rebecca Ruizhttps://helios-i.mashable.com/imagery/authors/01s9tVH6oSuivSFQB7tUAV3/image.fill.size_200x2...

#Anthropic #Claude #AI safety #AI competition #policy change
1 week ago · ai · - · -

Sandboxes won't save you from OpenClaw

The OpenClaw Debacle 2026 In 2026, so far, OpenClaw has: - Deleted a user's inboxhttps://x.com/summeryue0/status/2025774069124399363 - Spent 450 k in cryptohtt...

#AI safety #prompt injection #sandboxing #malicious AI agents #OpenClaw #AI security

Newer posts

Older posts