model inference

3天前 · ai

将 LLM 内存削减 84%：深入探讨 Fused Kernels

为什么你的最终 LLM 层会 OOM，以及如何使用自定义 Triton kernel 来解决。文章《Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels》已出现 fi...

#LLM #memory optimization #fused kernels #Triton #GPU performance #deep learning #model inference
1周前 · ai

思考时间：它如何改变模型的面貌

阅读更多关于思考时间：它如何改变模型的面貌

#thinking time #model inference #LLM performance #prompt engineering #response latency
2周前 · ai

Zsxkib 在 Replicate 上的 Memo 模型初学者指南

封面图片：Zsxkib 在 Replicate 上的《Memo 模型入门指南》 https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,for...

#Memo model #Replicate #AI guide #machine learning #model inference
1个月前 · ai

在 AWS Graviton 上优化 PyTorch 模型推理

加速 CPU 上 AI/ML 的技巧 — 第 2 部分文章《Optimizing PyTorch Model Inference on AWS Graviton》首次发表于 Towards Data Science....

#pytorch #aws-graviton #model-inference #cpu-optimization #deep-learning
1个月前 · ai

在 CPU 上优化 PyTorch 模型推理

在 Intel Xeon 上如狮子般飞翔文章《Optimizing PyTorch Model Inference on CPU》首次发表于 Towards Data Science....

#PyTorch #CPU optimization #model inference #deep learning #Intel Xeon
1个月前 · ai

2025年12月5日 | 同义周刊：来自 Tongyi Lab 的前沿 AI 每周精选

你好，构建者和愿景者，本周，本地 AI 获得了重大升级——你的工作流也变得更锐利、更快速、更具表现力。让我们深入了解。生态…

#Qwen3-Next #llama.cpp #local AI #model inference #Alibaba Tongyi Lab