model inference

3 days ago · ai

Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels

Why your final LLM layer is OOMing and how to fix it with a custom Triton kernel. The post Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels appeared fi...

#LLM #memory optimization #fused kernels #Triton #GPU performance #deep learning #model inference
1 week ago · ai

زمن التفكير: كيف يغير лицо النماذج

Read more about زمن التفكير: كيف يغير лицо النماذج...

#thinking time #model inference #LLM performance #prompt engineering #response latency
2 weeks ago · ai

A beginner's guide to the Memo model by Zsxkib on Replicate

!Cover image for A beginner's guide to the Memo model by Zsxkib on Replicatehttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,for...

#Memo model #Replicate #AI guide #machine learning #model inference
1 month ago · ai

Optimizing PyTorch Model Inference on AWS Graviton

Tips for accelerating AI/ML on CPU — Part 2 The post Optimizing PyTorch Model Inference on AWS Graviton appeared first on Towards Data Science....

#pytorch #aws-graviton #model-inference #cpu-optimization #deep-learning
1 month ago · ai

Optimizing PyTorch Model Inference on CPU

Flyin’ Like a Lion on Intel Xeon The post Optimizing PyTorch Model Inference on CPU appeared first on Towards Data Science....

#PyTorch #CPU optimization #model inference #deep learning #Intel Xeon
1 month ago · ai

Dec 5, 2025 | The Tongyi Weekly: Your weekly dose of cutting-edge AI from Tongyi Lab

Hello, builders and visionaries, This week, local AI got a major upgrade — and your workflows just got sharper, faster, and more expressive. Let’s dive in. Ecos...

#Qwen3-Next #llama.cpp #local AI #model inference #Alibaba Tongyi Lab