[Paper] BanglaASTE: A Novel Framework for Aspect-Sentiment-Opinion Extraction in Bangla E-commerce Reviews Using Ensemble Deep Learning

Published: 1 month ago (November 26, 2025 at 08:27 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2511.21381v1

Overview

The paper presents BanglaASTE, the first end‑to‑end framework that can automatically pull out aspect terms, opinion expressions, and their sentiment polarity from Bangla‑language e‑commerce reviews. By releasing a new annotated dataset and an ensemble deep‑learning model, the authors push aspect‑based sentiment analysis (ABSA) forward for a low‑resource language that has been largely ignored by the research community.

Key Contributions

Bangla ASTE dataset – 3,345 manually annotated product reviews from Daraz, Facebook, and Rokomari, each labeled with aspect‑opinion‑sentiment triplets.
Hybrid matching pipeline – a graph‑based algorithm that links aspect and opinion spans using semantic similarity, handling informal spelling and code‑mixing typical of Bangla social text.
Ensemble model – combines BanglaBERT contextual embeddings with an XGBoost classifier, delivering a strong boost over vanilla transformer or classic baselines.
Comprehensive evaluation – reports 89.9 % accuracy and 89.1 % F1, outperforming prior multilingual ABSA approaches on the same data.
Open‑source release – code, trained models, and the dataset are made publicly available for reproducibility and downstream applications.

Methodology

Data collection & annotation – Reviews were scraped from three major Bangla e‑commerce platforms. Trained annotators marked three elements per sentence:
- Aspect (e.g., “battery life”)
- Opinion (e.g., “lasting long”)
- Sentiment (positive/negative/neutral).
Pre‑processing – Normalization steps address common Bangla quirks: inconsistent spelling, mixed English numerals, and emoticons.
Graph‑based matching – Each sentence is turned into a bipartite graph where nodes are candidate aspect spans and opinion spans. Edge weights are computed via cosine similarity of their BanglaBERT embeddings, and a maximum‑weight matching algorithm selects the most plausible aspect‑opinion pairs.
Ensemble classification
- BanglaBERT generates contextual vectors for each candidate span.
- XGBoost consumes these vectors (plus handcrafted features like POS tags and distance metrics) to predict the sentiment polarity of the pair.
- The final triplet list is the union of the graph‑matched pairs with the XGBoost‑predicted sentiment.
Training & evaluation – 80 % of the dataset is used for training, 10 % for validation, and 10 % for testing. Standard metrics (accuracy, precision, recall, F1) are reported per component and for the full triplet extraction task.

Results & Findings

Model	Accuracy	Precision	Recall	F1
Baseline CRF + Word2Vec	71.4 %	68.9 %	66.2 %	67.5 %
Multilingual BERT (mBERT)	82.1 %	80.5 %	78.9 %	79.7 %
BanglaASTE (Ensemble)	89.9 %	88.6 %	89.6 %	89.1 %

The graph‑matching step alone raises aspect‑opinion pairing F1 by ~9 % over a naïve sequential tagging baseline.
Adding XGBoost for sentiment classification yields the final 2‑point F1 gain, confirming that shallow‑tree ensembles still complement deep embeddings in low‑resource settings.
Error analysis shows most remaining mistakes stem from highly ambiguous opinions (“meh”) and extreme spelling variations not covered by the normalization rules.

Practical Implications

E‑commerce analytics – Companies can automatically surface product‑level pain points (e.g., “slow charger”) and strengths (“crisp display”) from Bangla reviews, enabling faster product‑roadmap decisions.
Customer support automation – Chatbots can be equipped with the triplet extractor to flag negative aspects in real time and route tickets to the right support team.
Localized sentiment dashboards – Marketing teams can monitor sentiment trends across regions where Bangla is dominant, without needing manual tagging.
Transferable pipeline – The graph‑matching + XGBoost pattern can be adapted to other low‑resource languages that suffer from spelling noise and code‑mixing, reducing the need for massive labeled corpora.

Limitations & Future Work

Dataset size – 3.3 k reviews, while a solid start, is still modest; larger, domain‑diverse corpora could improve generalization.
Domain specificity – The current data is limited to product reviews; extending to social media or news comments may require additional preprocessing tweaks.
Aspect granularity – The model treats each aspect as a flat span; hierarchical aspect taxonomies (e.g., “camera → resolution”) are not yet supported.
Future directions suggested by the authors include:
1. Semi‑supervised data augmentation to mitigate sparsity.
2. Incorporating a multilingual pre‑training step to better handle code‑mixed Bangla‑English text.
3. Exploring graph neural networks for end‑to‑end aspect‑opinion pairing.

Authors

Ariful Islam
Md Rifat Hossen
Abir Ahmed
B M Taslimul Haque

Paper Information

arXiv ID: 2511.21381v1
Categories: cs.LG, cs.CL
Published: November 26, 2025
PDF: Download PDF

[Paper] BanglaASTE: A Novel Framework for Aspect-Sentiment-Opinion Extraction in Bangla E-commerce Reviews Using Ensemble Deep Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] BanglaMM-Disaster: A Multimodal Transformer-Based Deep Learning Framework for Multiclass Disaster Classification in Bangla

[Paper] Developing an Open Conversational Speech Corpus for the Isan Language

[Paper] ASR Error Correction in Low-Resource Burmese with Alignment-Enhanced Transformers using Phonetic Features

[Paper] Enhancing Burmese News Classification with Kolmogorov-Arnold Network Head Fine-tuning