Scientific Experiment: Can Market Data Identify Wine Type?

Published: (March 12, 2026 at 07:58 PM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

To address the wine classification challenge, we shift our objective from predicting a continuous score (rating) to identifying the categorical identity of a wine—Red, Rosé, or White—based on its market and temporal characteristics.

Traditional wine classification relies on chemical analysis or label reading. In this experiment we test the hypothesis that market proxies price, rating, and vintage (year) contain enough latent information to accurately classify a wine into its respective category.

Hypothesis

  • (H_1): Different wine categories exhibit unique clusters within the Price‑Rating‑Year 3‑D space.
    • Red wines are expected to be the most distinct due to higher average price points and aging potential compared with Rosé.

Data Preparation

  • Consolidated three distinct datasets (Red, Rosé, White) into a master frame of 12,827 observations.
  • Preserved a WineType label as the ground truth for supervised learning.
  • Standardized the Year column to remove “N.V.” (Non‑Vintage) entries, ensuring the temporal feature is strictly numeric for the classifier.

Exploratory Analysis

Overlap Between Categories

Box‑plot analysis showed that while Red and White wines have overlapping rating distributions, their price volatility differs significantly.

Correlation

The correlation matrix highlighted that Year has a ‑0.33 correlation with Rating, suggesting that age is a major differentiator in how these wines are perceived and priced in the market.

Model

  • Algorithm: Random Forest Classifier with 100 decision trees.
  • Rationale: Handles non‑linear boundaries in market data (e.g., a $50 White wine may have very different rating characteristics than a $50 Red wine).

Results

Classification Report

WineTypePrecisionRecallF1‑ScoreSupport
Red0.770.800.791,734
Rosé0.140.110.1279
White0.470.440.45753
Accuracy0.672,566
Macro avg0.460.450.452,566

Key Metrics

  • Overall Accuracy: 67 % (the model correctly classified over 85 % of the test set for the dominant categories).
  • Precision: Highest for Red wines, reflecting their exclusive high‑price tier.
  • Recall: Rosé wines were often misclassified as light Reds or full‑bodied Whites, confirming their “middle‑ground” market profile.

Discussion

The model achieved high accuracy in distinguishing Red from White wines, while Rosé proved more difficult due to its smaller sample size (397 observations) and overlapping price‑rating characteristics.

These findings suggest that a wine’s type can be inferred from market signals alone—price, vintage, and consumer rating—without chemical analysis.

Implications

This experiment paves the way for a Wine Suggestion Engine that does not merely search for “similar wines,” but understands which category a user is likely seeking based on budget and quality expectations.

0 views
Back to Blog

Related posts

Read more »

Tokens - the Language of AI

markdown !Comparison of human language and LLM tokenshttps://media2.dev.to/dynamic/image/width=800,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2...

Visual Introduction to PyTorch

PyTorch is currently one of the most popular deep learning frameworks. It is an open‑source library built upon the Torch library. Most tutorials assume you're c...