multimodal AI

4天前 · ai

全新 Apple 模型将视觉理解与图像生成相结合，取得惊人效果

Apple 研究人员已发表了一项关于 Manzano 的研究，Manzano 是一种多模态模型，结合了视觉理解和文本到图像生成，同时显著……

#Apple #multimodal AI #vision-language model #text-to-image generation #Manzano #computer vision #generative AI #AI research
4天前 · ai

Gemini 的新 Beta 功能基于你的照片、电子邮件等提供主动响应

Personal Intelligence 默认关闭，因为用户可以自行选择是否以及何时将他们的 Google apps 连接到 Gemini……

#Gemini #Google AI #personal intelligence #multimodal AI #beta feature #privacy controls #email integration #photo analysis
1周前 · ai

从像素到卡路里：使用 GPT-4o 构建多模态餐食分析引擎

🍝 从像素到卡路里——多模态 AI 与自动卡路里追踪我们都有过这样的经历：盯着一盘美味的意面，想弄清楚它是否……

#multimodal AI #GPT-4o #computer vision #nutrition analysis #Streamlit
1周前 · ai

为什么 Image Hallucination 比 Text Hallucination 更危险

封面图片：Why Image Hallucination Is More Dangerous Than Text Hallucination https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=au...

#image hallucination #vision-language models #AI safety #multimodal AI #generative AI
1周前 · ai

NVIDIA 推出全新开放模型、数据和工具，推动 AI 在各行业的进步

NVIDIA 扩展了 Open‑Model 生态系统。NVIDIA 今日宣布了一套新的开源模型、数据和工具，旨在加速 AI 在各行业的采用。

#NVIDIA #open foundation models #multimodal AI #AI data resources #AI acceleration
2周前 · ai

新年 AI 惊喜：Fal 推出自研版 Flux 2 图像生成器，成本降低 10 倍，效率提升 6 倍

在其新一轮1.4亿美元D轮融资之后，跨模态企业AI媒体创作平台 fal.ai，简称“fal”或“Fal”。

#generative AI #image generation #Flux 2 #diffusion models #Fal.ai #cost efficiency #open source #multimodal AI
3周前 · ai

Gemini 2.5：推动前沿的高级推理、多模态、LongContext 与下一代 Agentic 能力

概述 Gemini 2.5 是一种更智能的 AI，能够更好地观察、思考和记忆。认识 Gemini 2.5 Pro，这是一款能够同时读取图像、视频和文本的新 AI，并且能够解决…

#Gemini 2.5 #multimodal AI #long‑context reasoning #video understanding #agentic capabilities #AI assistants #Flash model
3周前 · ai

LLM 深度解析 2025：为什么 Claude 4 和 GPT-5.1 改变一切

2025 年末的 LLM 生态图景整个生态系统已经远远超越了生成式 AI 的早期阶段。我们正看到向更高自主性、深度…的不断推动。

#LLM #Claude 4 #GPT-5.1 #multimodal AI #context management #agentic workflows #generative AI 2025 #AI tool integration
3周前 · ai

为什么你的ChatGPT图片会失败？

概览：ChatGPT 在2025年12月的每周活跃用户达到9亿——是2024年12月的三倍。然而，只有约7%的查询涉及多模态……

#ChatGPT #AI image generation #prompt engineering #multimodal AI #image generation troubleshooting
3周前 · ai

LAION-400M：开放数据集，包含 CLIP 过滤的 4 亿图像-文本对

LAION-400M 是一个巨大的公共资源，旨在激发新想法。它包含约 4 亿张图像，每张图像配有简短的标题，经过清理和 CLIP‑filtered。

#LAION-400M #image-text dataset #CLIP-filtered #multimodal AI #open data #machine learning #computer vision
3周前 · ai

精通 Gemini 3 API：构建下一代多模态 AI 应用

大型语言模型迎来真正的多模态 Gemini 3 – 技术深度解析大型语言模型（LLMs）的格局已从以文本为中心的交…

#Gemini 3 #multimodal AI #large language models #LLM API #Omni-Modal Transformer #AI agents #Google AI #AI application architecture
3周前 · ai

新加坡氛围齐聚新Google DeepMind办公室

活动概述我们最近在新加坡谷歌 DeepMind 新办公室举办了一场聚集百名开发者的氛围编码会议，展示了 Google AI Studio 和 G...

#Google DeepMind #Gemini API #AI Studio #hackathon #Singapore #multimodal AI #job interview app #recipe generator #builder community

Newer posts

Older posts