AI safety — Page 4

排序:

3周前 · ai · - · -

我构建了一个使用 NumPy 在运行时对 LLMs 进行教练的反馈回路

大多数针对 LLM 的护栏系统就像酒吧的保镖：它们在门口检查每个请求，决定通过或拒绝，然后就把它忘掉。我想要……

#LLM #runtime coaching #AI guardrails #feedback loop #NumPy #open source #AI safety #prompt engineering
3周前 · ai · - · -

Anthropic 安全研究员辞职，警告‘世界正处于危险之中’

报告摘要：一位匿名读者分享了一份报告：一名 Anthropic 安全研究员辞职，称“世界正处于危险之中”，部分原因是 AI 的进步。来源 https://...

#AI safety #Anthropic #AI risk #AI regulation
3周前 · ai · - · -

超越Chatbot：可信AI的蓝图

2026年1月29日

#AI trust #AI hallucination #real-time AI #autonomous driving #telemetry #AI safety #Google AI #AI reliability
3周前 · ai · - · -

[论文] Large Language Models 能让每个人都快乐吗？

大型语言模型（LLMs）中的错位指的是未能同时满足安全、价值和文化维度的要求，导致模型产生偏离预期的行为。

#large language models #misalignment #benchmark #AI safety #NLP
3周前 · it · - · -

Waymo的车辆现在在纳什维尔实现完全无人驾驶

概述：Waymo 正在更接近在田纳西州纳什维尔向公众提供机器人出租车服务。该公司宣布计划将其 robotaxi 引入纳什维尔……

#Waymo #autonomous vehicles #robotaxi #driverless #Nashville #self-driving cars #AI safety
3周前 · ai · - · -

超越聊天机器人：可信 AI 的蓝图

2026年1月29日

#AI trust #AI hallucination #real‑time AI #AI safety #Google AI #autonomous systems #LLM reliability
3周前 · ai · - · -

具身迫使 AI 摆脱舒适的抽象

机器人手臂在抓取过程中停在了中途。电机嗡嗡作响。Vision model 自信。Plan graph 完好无损。但它仍然犹豫，像紧张的手一样颤抖……

#embodiment #robotics #real‑world AI #AI safety #perception #model deployment #physical AI
3周前 · ai · - · -

超越Chatbot：可信AI的蓝图

2026年1月29日

#AI trust #AI hallucination #real‑time AI #autonomous systems #AI safety #Google AI #developer experts
3周前 · ai · - · -

对超级智能控制条件的重建

概述本文提供了对能够保持对超出人类认知能力的系统进行控制的机制的深入分析。T...

#superintelligence #AI safety #control problem #intelligence explosion #AI alignment #AGI governance #motivation selection
3周前 · ai · - · -

测量模型过度自信：当 AI 以为它知道

你是否曾经向一个 AI 语言模型提问，看到它自信满满地回答，却发现答案完全错误？欢迎来到这个世界……

#model overconfidence #confidence calibration #AI safety #language models #AI alignment #model evaluation
3周前 · ai · - · -

9次ChatGPT卡通化趋势彻底出错

ChatGPT 在点名你吗？作者：Timothy Beck Werth https://mashable.com/author/timothy-beck-werth !Timothy Beck Werth 的头像，一位英俊的记者，拥有…

#ChatGPT #generative AI #AI memes #AI safety #social media trends
3周前 · ai · - · -

让信任变得无关紧要：游戏玩家对Agentic AI安全的看法

我写了一篇简短的立场论文，论证当前的 agentic AI 安全失误是重复出现的 confused deputy problem。我们正在向代理授予 ambient authority……

#agentic AI #AI safety #confused deputy problem #trustless AI #hard authority #prompt engineering

Newer posts

Older posts