[Paper] BanglaASTE: A Novel Framework for Aspect-Sentiment-Opinion Extraction in Bangla E-commerce Reviews Using Ensemble Deep Learning
Source: arXiv - 2511.21381v1
Overview
The paper presents BanglaASTE, the first end‑to‑end framework that can automatically pull out aspect terms, opinion expressions, and their sentiment polarity from Bangla‑language e‑commerce reviews. By releasing a new annotated dataset and an ensemble deep‑learning model, the authors push aspect‑based sentiment analysis (ABSA) forward for a low‑resource language that has been largely ignored by the research community.
Key Contributions
- Bangla ASTE dataset – 3,345 manually annotated product reviews from Daraz, Facebook, and Rokomari, each labeled with aspect‑opinion‑sentiment triplets.
- Hybrid matching pipeline – a graph‑based algorithm that links aspect and opinion spans using semantic similarity, handling informal spelling and code‑mixing typical of Bangla social text.
- Ensemble model – combines BanglaBERT contextual embeddings with an XGBoost classifier, delivering a strong boost over vanilla transformer or classic baselines.
- Comprehensive evaluation – reports 89.9 % accuracy and 89.1 % F1, outperforming prior multilingual ABSA approaches on the same data.
- Open‑source release – code, trained models, and the dataset are made publicly available for reproducibility and downstream applications.
Methodology
- Data collection & annotation – Reviews were scraped from three major Bangla e‑commerce platforms. Trained annotators marked three elements per sentence:
- Aspect (e.g., “battery life”)
- Opinion (e.g., “lasting long”)
- Sentiment (positive/negative/neutral).
- Pre‑processing – Normalization steps address common Bangla quirks: inconsistent spelling, mixed English numerals, and emoticons.
- Graph‑based matching – Each sentence is turned into a bipartite graph where nodes are candidate aspect spans and opinion spans. Edge weights are computed via cosine similarity of their BanglaBERT embeddings, and a maximum‑weight matching algorithm selects the most plausible aspect‑opinion pairs.
- Ensemble classification
- BanglaBERT generates contextual vectors for each candidate span.
- XGBoost consumes these vectors (plus handcrafted features like POS tags and distance metrics) to predict the sentiment polarity of the pair.
- The final triplet list is the union of the graph‑matched pairs with the XGBoost‑predicted sentiment.
- Training & evaluation – 80 % of the dataset is used for training, 10 % for validation, and 10 % for testing. Standard metrics (accuracy, precision, recall, F1) are reported per component and for the full triplet extraction task.
Results & Findings
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| Baseline CRF + Word2Vec | 71.4 % | 68.9 % | 66.2 % | 67.5 % |
| Multilingual BERT (mBERT) | 82.1 % | 80.5 % | 78.9 % | 79.7 % |
| BanglaASTE (Ensemble) | 89.9 % | 88.6 % | 89.6 % | 89.1 % |
- The graph‑matching step alone raises aspect‑opinion pairing F1 by ~9 % over a naïve sequential tagging baseline.
- Adding XGBoost for sentiment classification yields the final 2‑point F1 gain, confirming that shallow‑tree ensembles still complement deep embeddings in low‑resource settings.
- Error analysis shows most remaining mistakes stem from highly ambiguous opinions (“meh”) and extreme spelling variations not covered by the normalization rules.
Practical Implications
- E‑commerce analytics – Companies can automatically surface product‑level pain points (e.g., “slow charger”) and strengths (“crisp display”) from Bangla reviews, enabling faster product‑roadmap decisions.
- Customer support automation – Chatbots can be equipped with the triplet extractor to flag negative aspects in real time and route tickets to the right support team.
- Localized sentiment dashboards – Marketing teams can monitor sentiment trends across regions where Bangla is dominant, without needing manual tagging.
- Transferable pipeline – The graph‑matching + XGBoost pattern can be adapted to other low‑resource languages that suffer from spelling noise and code‑mixing, reducing the need for massive labeled corpora.
Limitations & Future Work
- Dataset size – 3.3 k reviews, while a solid start, is still modest; larger, domain‑diverse corpora could improve generalization.
- Domain specificity – The current data is limited to product reviews; extending to social media or news comments may require additional preprocessing tweaks.
- Aspect granularity – The model treats each aspect as a flat span; hierarchical aspect taxonomies (e.g., “camera → resolution”) are not yet supported.
- Future directions suggested by the authors include:
- Semi‑supervised data augmentation to mitigate sparsity.
- Incorporating a multilingual pre‑training step to better handle code‑mixed Bangla‑English text.
- Exploring graph neural networks for end‑to‑end aspect‑opinion pairing.
Authors
- Ariful Islam
- Md Rifat Hossen
- Abir Ahmed
- B M Taslimul Haque
Paper Information
- arXiv ID: 2511.21381v1
- Categories: cs.LG, cs.CL
- Published: November 26, 2025
- PDF: Download PDF