Scientific Experiment: Can Market Data Identify Wine Type?

Published: 1 month ago (March 12, 2026 at 07:58 PM EDT)

3 min read

Source: Dev.to

Source: Dev.to

Introduction

To address the wine classification challenge, we shift our objective from predicting a continuous score (rating) to identifying the categorical identity of a wine—Red, Rosé, or White—based on its market and temporal characteristics.

Traditional wine classification relies on chemical analysis or label reading. In this experiment we test the hypothesis that market proxies price, rating, and vintage (year) contain enough latent information to accurately classify a wine into its respective category.

Hypothesis

(H_1): Different wine categories exhibit unique clusters within the Price‑Rating‑Year 3‑D space.
- Red wines are expected to be the most distinct due to higher average price points and aging potential compared with Rosé.

Data Preparation

Consolidated three distinct datasets (Red, Rosé, White) into a master frame of 12,827 observations.
Preserved a WineType label as the ground truth for supervised learning.
Standardized the Year column to remove “N.V.” (Non‑Vintage) entries, ensuring the temporal feature is strictly numeric for the classifier.

Exploratory Analysis

Overlap Between Categories

Box‑plot analysis showed that while Red and White wines have overlapping rating distributions, their price volatility differs significantly.

Correlation

The correlation matrix highlighted that Year has a ‑0.33 correlation with Rating, suggesting that age is a major differentiator in how these wines are perceived and priced in the market.

Model

Algorithm: Random Forest Classifier with 100 decision trees.
Rationale: Handles non‑linear boundaries in market data (e.g., a $50 White wine may have very different rating characteristics than a $50 Red wine).

Results

Classification Report

WineType	Precision	Recall	F1‑Score	Support
Red	0.77	0.80	0.79	1,734
Rosé	0.14	0.11	0.12	79
White	0.47	0.44	0.45	753
Accuracy	—	—	0.67	2,566
Macro avg	0.46	0.45	0.45	2,566

Key Metrics

Overall Accuracy: 67 % (the model correctly classified over 85 % of the test set for the dominant categories).
Precision: Highest for Red wines, reflecting their exclusive high‑price tier.
Recall: Rosé wines were often misclassified as light Reds or full‑bodied Whites, confirming their “middle‑ground” market profile.

Discussion

The model achieved high accuracy in distinguishing Red from White wines, while Rosé proved more difficult due to its smaller sample size (397 observations) and overlapping price‑rating characteristics.

These findings suggest that a wine’s type can be inferred from market signals alone—price, vintage, and consumer rating—without chemical analysis.

Implications

This experiment paves the way for a Wine Suggestion Engine that does not merely search for “similar wines,” but understands which category a user is likely seeking based on budget and quality expectations.