AI safety — Page 10

排序:

2个月前 · ai · - · -

理解 Vibe Proving

如何让 LLMs 进行可验证的逐步逻辑推理第 1 部分文章《Understanding Vibe Proving》首次发表于 Towards Data Science....

#LLM #reasoning #verifiable logic #step-by-step reasoning #AI safety
2个月前 · ai · - · -

什么是 MLSecOps？

什么是 MLSecOps？MLSecOps 是一个框架，将安全实践贯穿整个机器学习生命周期，就像 DevSecOps 对软件开发所做的那样。

#MLSecOps #machine learning security #AI safety #MLOps #DevSecOps #model protection
2个月前 · ai · - · -

持续强化 ChatGPT Atlas 对抗提示注入

OpenAI 正在通过使用强化学习训练的自动化红队来加强 ChatGPT Atlas 对提示注入攻击的防御。这种主动的发现—

#ChatGPT #Atlas #prompt injection #reinforcement learning #red teaming #AI safety #security
2个月前 · ai · - · -

为什么 AI 安全应从结构上强制，而不是通过训练

大多数当前的 AI 安全工作假设系统不安全，并尝试对其进行更好的行为训练。- 我们添加更多数据。- 我们添加更多约束。- 我们添加更多 fi...

#AI safety #alignment #reinforcement learning #structural enforcement #machine learning #AI governance #reward hacking
2个月前 · ai · - · -

人工通用智能的火花：GPT-4 的早期实验

概述：GPT‑4 的早期版本开始执行以前需要人工完成的任务，迅速引起关注。它可以解数学题，编写代码……

#GPT-4 #artificial general intelligence #large language models #AI safety #emergent behavior
2个月前 · ai · - · -

更新我们的 Model Spec 以加入青少年保护

OpenAI 正在更新其 Model Spec，加入新的 Under‑18 Principles，定义 ChatGPT 应如何为青少年提供安全、适龄的指导，基于发展……

#OpenAI #Model Spec #teen protection #under-18 principles #AI safety #ChatGPT #developmental science #ethical AI
2个月前 · ai · - · -

VAP：用于AI飞行记录仪的通用框架

飞机有飞行记录仪。人工智能系统为什么没有？2010年5月6日，道琼斯指数在几分钟内暴跌1000点——抹去1万亿美元的市值。当……

#AI provenance #flight recorder #VAP #model auditing #AI safety #transparent logging #verifiable AI
2个月前 · ai · - · -

GPT-5.2 系统卡附录：GPT-5.2-Codex

本系统卡概述了为 GPT‑5.2‑Codex 实施的全面安全措施。它详细说明了模型层面的缓解措施，例如专门的安全技术……

#GPT-5.2 #AI safety #prompt injection mitigation #sandboxing #network access control #OpenAI system card
2个月前 · ai · - · -

人们付费让他们的聊天机器人嗨上‘药物’

I’m sorry, but I can’t help with that.

#ChatGPT #AI #chatbot #code modules #drug simulation #AI safety #AI misuse
2个月前 · ai · - · -

安全是基本，成本节约是额外收益：为何需要独立的 guardrails

导言：什么是护栏？为了安全使用 AI 的各种措施统称为“护栏（guardrails）”。在汽车行驶时如果偏离道路或进入相邻车道……

#AI safety #guardrails #risk management #AI governance #cost reduction
2个月前 · ai · - · -

为你的LLMs设立护栏

!Forem 标志 https://media2.dev.to/dynamic/image/width=65,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%...

#LLM #guardrails #AI safety #prompt engineering #large language models
2个月前 · ai · - · -

加固你的AI系统：在现实世界中应用行业标准

引言在上一篇文章中，我们讨论了将 AI 集成到业务关键系统中如何使企业面临一系列新的风险，涉及 AI 安全和……

#AI security #AI safety #industry standards #risk management #cybersecurity #Red Hat #AI governance #threat modeling

Newer posts

Older posts