AI safety — Page 11

Sort:

2 months ago · ai · - · -

Will AI Ever Be Good Enough to Not Need Spending Limits?

'markdown “Won’t AI just get better at this?” Short answer No. Understanding why reveals something fundamental about how we should think about AI safety.

#AI safety #large language models #LLM alignment #RLHF #financial AI #spending limits #LangChain #tool use #probabilistic models
2 months ago · ai · - · -

All AI Videos Are Harmful (2025)

Article URL: https://idiallo.com/blog/all-ai-videos-are-harmful Comments URL: https://news.ycombinator.com/item?id=46498651 Points: 19 Comments: 6...

#generative AI #deepfakes #AI ethics #misinformation #AI safety
2 months ago · ai · - · -

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Overview Meet Llama Guard, a simple tool built to make chats with AI safer and clearer for everyone. It looks at what people ask and what the AI answers, and s...

#Llama Guard #AI safety #LLM moderation #content filtering #open-source AI #prompt-response analysis
2 months ago · ai · - · -

AI sycophancy panic

Article URL: https://github.com/firasd/vibesbench/blob/main/docs/ai-sycophancy-panic.md Comments URL: https://news.ycombinator.com/item?id=46488396 Points: 38 C...

#AI alignment #LLM behavior #sycophancy #AI safety #benchmark
2 months ago · ai · - · -

AI Sycophancy Panic

Article URL: https://github.com/firasd/vibesbench/blob/main/docs/ai-sycophancy-panic.md Comments URL: https://news.ycombinator.com/item?id=46488396 Points: 10 C...

#AI safety #language model behavior #sycophancy #benchmark #research
2 months ago · ai · - · -

Nightshade: Make images unsuitable for model training

Article URL: https://nightshade.cs.uchicago.edu/whatis.html Comments URL: https://news.ycombinator.com/item?id=46487342 Points: 16 Comments: 2...

#image data poisoning #model training protection #AI safety #privacy #nightshade #data security
2 months ago · ai · - · -

In the next 30 days, I’m talking about the democratisation of AI with one mission: AI should feel practical, affordable, and safe, especially for small businesses and founders.

Cleaned‑up Markdown markdown !Forem Logohttps://media2.dev.to/dynamic/image/width=65,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-upload...

#AI democratization #practical AI #affordable AI #AI safety #small business AI #founder tools
2 months ago · ai · - · -

Adversarial Attacks and Defences: A Survey

Overview Today many apps use deep learning to perform complex tasks quickly, from image analysis to voice recognition. However, tiny, almost invisible changes...

#adversarial attacks #machine learning security #deep learning robustness #AI safety #neural networks
2 months ago · ai · - · -

Instructions Are Not Control

!Cover image for Instructions Are Not Controlhttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-u...

#prompt engineering #LLM #jailbreak #AI safety #language models
2 months ago · ai · - · -

The Loop Changes Everything: Why Embodied AI Breaks Current Alignment Approaches

Stateless vs. Stateful AI ChatGPT and similar chat models are stateless: each API call is independent and the model has no: - Persistent memory – it forgets ev...

#embodied AI #AI alignment #stateless models #large language models #robotics #AI safety
2 months ago · ai · - · -

Stop Begging Your AI to Be Safe: The Case for Constraint Engineering

I am tired of “Prompt Engineering” as a safety strategy. If you are building autonomous agents—AI that can actually do things like query databases, move files,...

#AI safety #constraint engineering #prompt engineering #autonomous agents #LLM security #prompt injection #AI reliability
2 months ago · ai · - · -

Why “Smart” AI Still Makes Dumb Decisions

Intelligence without constraints is just speed When an AI system makes a bad decision, we usually blame the model. But most of the time, the model did exactly...

#AI safety #guardrails #control logic #model constraints #decision making #predictability #AI reliability

Newer posts

Older posts