Why do tree-based models still outperform deep learning on tabular data?

Published: (February 7, 2026 at 04:10 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Introduction

Deep neural networks have revolutionized image and text processing, but when it comes to spreadsheet‑style tabular data, classic tree‑based methods often still come out on top.

Empirical Findings

A large benchmark covering many datasets showed that tree‑based models such as XGBoost and Random Forests consistently outperform deep learning models on medium‑sized tables (≈10 k rows), even after extensive hyper‑parameter tuning of the neural networks. The pattern persisted across a wide range of settings and checks.

Why Trees Perform Better

  • Robustness to irrelevant features – trees can ignore useless columns without harming performance.
  • Preservation of data shape – tree algorithms work directly with the original tabular structure, avoiding the need for extensive preprocessing.
  • Ability to capture irregular patterns – decision trees can model heterogeneous interactions and non‑linearities that are harder for standard feed‑forward networks to learn on tabular data.

Implications

These results highlight that deep learning is not a universal solution; specialized approaches are still needed for tabular problems. The authors released the full suite of experiments, raw results, and configuration details to enable reproducibility and further research.

Takeaway

When your dataset is organized in rows and columns, don’t automatically assume a deep neural network will be optimal—tree‑based models may still be the smarter choice.

References

0 views
Back to Blog

Related posts

Read more »

The Origin of the Lettuce Project

Two years ago, Jason and I started what became known as the BLT Lettuce Project with a very simple goal: make it easier for newcomers to OWASP to find their way...