Delty (YC X25) Is Hiring an ML Engineer
Article URL: https://www.ycombinator.com/companies/delty/jobs/MDeC49o-machine-learning-engineer Comments URL: https://news.ycombinator.com/item?id=46318676 Poin...
Article URL: https://www.ycombinator.com/companies/delty/jobs/MDeC49o-machine-learning-engineer Comments URL: https://news.ycombinator.com/item?id=46318676 Poin...
Conventional evaluation methods for multimodal LLMs (MLLMs) lack interpretability and are often insufficient to fully disclose significant capability gaps acros...
Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geomet...
While image editing has advanced rapidly, video editing remains less explored, facing challenges in consistency, control, and generalization. We study the desig...
Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculation...
This paper examines the exploration-exploitation trade-off in reinforcement learning with verifiable rewards (RLVR), a framework for improving the reasoning of ...
Standard practice across domains from robotics to language is to first pretrain a policy on a large-scale demonstration dataset, and then finetune this policy, ...
Recent advances in multimodal models highlight the pivotal role of image tokenization in high-resolution image generation. By compressing images into compact la...
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning...
We investigate the mechanisms that arise when transformers are trained to solve arithmetic on sequences where tokens are variables whose meaning is determined o...
AI technologies have rapidly moved into business and research applications that involve large text corpora, including computational journalism research and news...
Video Large Language Models (VLLMs) unlock world-knowledge-aware video understanding through pretraining on internet-scale data and have already shown promise o...