[Paper] RadarGen: Automotive Radar Point Cloud Generation from Cameras
Source: arXiv - 2512.17897v1
Overview
RadarGen introduces a diffusion‑based generative model that can turn multi‑camera images into realistic automotive radar point clouds. By bridging the visual and radar domains, the work opens a path to cheap, scalable radar data generation for training and testing autonomous‑driving systems.
Key Contributions
- Cross‑modal diffusion model: Adapts image‑latent diffusion to synthesize radar BEV maps directly from camera streams.
- Rich radar representation: Generates bird’s‑eye‑view (BEV) tensors that encode spatial layout, radar cross‑section (RCS), and Doppler velocity.
- Foundation‑model conditioning: Leverages pretrained depth, semantic, and motion estimators to guide the diffusion process toward physically plausible radar returns.
- Lightweight point‑cloud recovery: A fast post‑processing step converts the generated BEV maps back into 3‑D radar point clouds.
- Scalable data pipeline: Works with any multi‑camera dataset, enabling large‑scale multimodal simulation without needing real radar hardware.
Methodology
- Input preprocessing – Multi‑view camera images are passed through off‑the‑shelf models that predict per‑pixel depth, semantic class, and optical flow (motion). These cues are lifted into a common BEV grid.
- Diffusion generation – A latent diffusion model, originally designed for images, is repurposed to operate on the radar BEV tensor. The model iteratively denoises a random latent, conditioned on the BEV‑aligned visual cues, to produce a map containing:
- Occupancy (where radar returns appear)
- RCS values (reflectivity strength)
- Doppler (relative velocity).
- Point‑cloud reconstruction – The final BEV tensor is sampled to extract individual radar points (x, y, z, RCS, velocity) and projected back into the vehicle‑centric 3‑D space. The reconstruction step is deliberately lightweight to keep the pipeline fast enough for data‑augmentation loops.
Results & Findings
- Distribution fidelity – Statistical tests show the synthetic radar point clouds match real‑world radar distributions (e.g., range‑intensity curves, velocity histograms) across diverse driving scenarios.
- Perception gap reduction – Object detectors trained on RadarGen‑augmented data achieve up to 12 % higher average precision on real radar test sets compared to models trained only on camera‑only data.
- Qualitative realism – Visualizations reveal that generated radar returns respect occlusions, appear on reflective surfaces (e.g., metal cars, signs), and exhibit realistic Doppler patterns for moving objects.
- Scalability – The system can generate thousands of radar frames per hour on a single GPU, making it practical for large‑scale simulation pipelines.
Practical Implications
- Data augmentation for ADAS – Developers can enrich existing camera‑only datasets with synthetic radar, improving robustness of multimodal perception stacks without costly radar sensor deployments.
- Simulation‑first development – Autonomous‑vehicle simulators can now produce coherent radar streams alongside LiDAR and camera feeds, enabling end‑to‑end testing of sensor‑fusion algorithms.
- Domain adaptation – Synthetic radar can be used to pre‑train models that later fine‑tune on a small set of real radar recordings, accelerating the rollout of radar‑enabled features.
- Cost‑effective benchmarking – Companies can benchmark radar‑dependent algorithms on a variety of weather, lighting, and traffic conditions generated from existing video datasets, reducing the need for extensive field data collection.
Limitations & Future Work
- Physical fidelity trade‑off – While statistical properties match real data, the model does not simulate fine‑grained wave phenomena (e.g., multipath, speckle) that can affect high‑precision radar algorithms.
- Dependence on pretrained cues – The quality of depth/semantic/motion inputs directly impacts radar realism; errors in these upstream models can propagate.
- Scenario coverage – Rare edge cases (e.g., extreme weather, exotic vehicle types) are under‑represented because the diffusion model learns from the distribution present in the training camera data.
- Future directions – Extending the diffusion framework to jointly generate radar and LiDAR, incorporating physics‑based radar simulators for hybrid training, and exploring self‑supervised conditioning signals to reduce reliance on external foundation models.
Authors
- Tomer Borreda
- Fangqiang Ding
- Sanja Fidler
- Shengyu Huang
- Or Litany
Paper Information
- arXiv ID: 2512.17897v1
- Categories: cs.CV, cs.AI, cs.LG, cs.RO
- Published: December 19, 2025
- PDF: Download PDF