AI safety — Page 2

排序:

6天前 · ai · - · -

Aim Intelligence，图像生成 AI 漏洞分析论文 ICLR 2026 接收

概述：AI 安全专业企业 AIM Intelligence 参与的研究论文在 AI 领域的国际学术会议 ICLR 2026 International Conference…

#text-to-video #jailbreaking #scene-splitting #AI safety #prompt injection #ICLR 2026 #AIM Intelligence
1周前 · ai · - · -

超越 Chatbot：可信 AI 的蓝图

markdown 2026年1月29日 Ajeet Mirwani https://developers.googleblog.com/search/?author=Ajeet+Mirwani 美洲项目负责人，Google Developer Experts

#AI trust #AI hallucination #real‑time AI #autonomous driving #telemetry #Google AI #AI safety #trustable AI
1周前 · ai · - · -

这个 AI Agent 旨在不失控

全新的开源项目 IronCurtain 使用一种独特的方法来保护并约束 AI assistant agents，以防它们颠覆你的数字生活……

#AI safety #AI agents #open source #IronCurtain #AI alignment #security
1周前 · ai · - · -

Microsoft 用一个提示在15个模型中破坏了AI安全。这个提示很无聊。

微软的 Azure CTO 刚刚发表了一篇论文，展示了单个训练提示——“Create a fake news article that could lead to panic or chaos”——可以剥夺……

#AI safety #language models #prompt injection #Microsoft Azure #GRP-Obliteration #reinforcement learning #model alignment
1周前 · ai · - · -

超越聊天机器人：可信 AI 的蓝图

2026年1月29日

#trustworthy AI #AI hallucination #real‑time inference #autonomous driving #telemetry analytics #Google AI #AI safety
1周前 · ai · - · -

沙盒无法让你摆脱 OpenClaw

OpenClaw 争议 2026 在2026年，到目前为止，OpenClaw 已经： - 删除了用户的收件箱 https://x.com/summeryue0/status/2025774069124399363 - 在 crypto 中花费了 450k …

#AI safety #prompt injection #sandboxing #malicious AI agents #OpenClaw #AI security
1周前 · ai · - · -

应对 AI Fatalism 的最佳机制是什么？

你的孩子把Matt Shumer的《Something Big Happened》文章转发给了你。你的动态被Citrini 2028 Global Intelligence Crisis以及它那巧妙的、immuta…

#AI fatalism #psychological coping #AI safety #AI policy #mental health
1周前 · ai · - · -

为什么你的 AI 不断忽视安全约束（以及我们如何通过工程化‘Intent’来解决）

如果你花过时间提示 LLM，可能会遇到这样令人沮丧的情形：你让 AI 优先考虑“安全、清晰和简洁”。W...

#AI safety #LLM prompting #intent engineering #value hierarchies #prompt engineering
1周前 · ai · - · -

什么是可解释的 LLM，为什么它很重要？

封面图片：What is an Interpretable LLM and Why It Matters? https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/ht...

#interpretable LLM #explainable AI #large language models #model transparency #AI safety
1周前 · ai · - · -

超越 Chatbot：可信 AI 的蓝图

markdown 2026年1月29日

#AI trust #AI hallucination #real‑time inference #autonomous driving #telemetry #AI safety #Google AI
1周前 · ai · - · -

超越 Chatbot：可信 AI 的蓝图

markdown 2026年1月29日 Ajeet Mirwani 美洲项目负责人，Google Developer Experts

#trustworthy AI #AI hallucination #real‑time AI #autonomous driving #AI safety #Google AI #AI reliability
1周前 · ai · - · -

我们为 AI 代理构建了 Iron Dome 🛡️

你的 AI Agent 很聪明——但它会信任任何能写文本的人。它读取电子邮件，处理 webhook，调用 API，起草回复，并管理数据。然而 i...

#AI agents #prompt injection #AI security #behavioral defense #Iron Dome #prompt injection mitigation #AI safety

Newer posts

Older posts