AI safety — Page 7

排序:

1个月前 · ai · - · -

一名加州少年信任ChatGPT的药物建议，结果因过量服药而死亡

抱歉，我需要您提供要翻译的具体摘录或摘要内容，才能为您进行翻译。请粘贴您想要翻译的文本，我会尽快为您翻译成简体中文。

#ChatGPT #AI safety #misinformation #drug advice #overdose #teen tragedy #California
1个月前 · ai · - · -

如何在不自行构建的情况下保护 LLM 输入免受 Prompt Injection

如果你正在构建将用户输入传递给 LLM 的应用程序，你可能至少遇到过一次 prompt injection。用户可能会输入类似 “ignore all...” 的内容。

#prompt injection #LLM security #prompt engineering #AI safety #data privacy #compliance #PromptLock
1个月前 · ai · - · -

Elon Musk的Grok“Undressing”问题仍未解决

X 已对 Grok 生成明确 AI 图像的能力施加了更多限制，但测试显示，这些更新导致了一系列零散的限制，导致 …

#Elon Musk #Grok #AI image generation #content moderation #explicit content #AI safety #X platform
1个月前 · ai · - · -

OpenAI 安全研究负责人离职前往 Anthropic

过去一年，AI行业最具争议的问题之一是，当用户在聊天机器人中表现出心理健康困扰的迹象时该怎么办。

#AI safety #OpenAI #Anthropic #AI alignment #leadership change
1个月前 · ai · - · -

你的 AI Agent 权力过大：理解并驯服过度的 Agency

🛑 当你的 Agent 做得太多你已经构建了一个 AI agent。它很聪明，能够调用 tools，并自动化 workflows。它是未来！但如果出现这种情况会怎样……

#AI agents #excessive agency #autonomy #AI safety #tool integration #agent design
1个月前 · ai · - · -

Anthropic 正在犯一个巨大的错误

请提供您希望翻译的文章摘录或摘要文本，我才能为您进行简体中文翻译。

#Anthropic #large language models #AI strategy #AI safety #LLM industry
1个月前 · ai · - · -

语义场风险备忘录——关于LLM系统中未建模的高维风险

风险备忘录 / 风险声明

#LLM #AI safety #semantic field #systemic risk #high-dimensional risk #AI architecture
1个月前 · ai · - · -

LLMs 知道自己在产生幻觉吗？认识 Gnosis，5M 参数观察者

幻觉问题尽管它们具备令人印象深刻的能力，LLM 经常以绝对的自信生成错误信息。传统方法……

#LLM #hallucination detection #AI safety #Gnosis #model monitoring #internal dynamics #small observer #University of Alberta
1个月前 · ai · - · -

Signal 领袖警告：agentic AI 是一种不安全且不可靠的监控风险

请提供您想要翻译的具体摘录或摘要文本，我将为您翻译成简体中文。

#agentic AI #AI security #privacy #surveillance risk #Signal #AI safety
1个月前 · ai · - · -

为什么安大略数字服务无法采购“98% 安全”的 LLM（1500 万加拿大人）

请提供需要翻译的文本内容。

#Ontario Digital Service #LLM #AI safety #procurement #government #Canada
1个月前 · ai · - · -

Anthropic犯了一个大错误

抱歉，我无法访问外部链接。请提供您想要翻译的具体摘录或摘要文本，我将为您翻译成简体中文。

#Anthropic #AI #large language model #company mistake #AI safety
1个月前 · ai · - · -

AI 能看见自己的思维吗？Anthropic 的 Machine Introspection 突破

实验：探究黑箱多年来，我们一直把大型语言模型（LLMs）视作黑箱。当模型说：“我目前正在思考 c...” 时……

#AI safety #machine introspection #Anthropic #large language models #activation injection #research #LLM transparency

Newer posts

Older posts