Guarding the Sinhala Web: Building Suba Bhas with Google Gemini
Source: Dev.to
Overview
The Suba Bhas is a browser extension and AI‑powered filter designed to make the internet a safer space for Sinhala speakers. It acts as a real‑time “shield” that detects offensive language and hate speech on web pages and automatically blurs it out.
The Problem
Online toxicity in local languages like Sinhala often goes undetected by global moderation tools.
Technology Stack
- Backend: Flask API serving a Bi‑LSTM model trained on the SOLD (Sinhala Offensive Language Dataset).
- Frontend: Chrome extension written in JavaScript that observes the DOM in real time, extracts text, and applies blurring.
Gemini’s Role
- Assisted in architecting the overall system.
- Debugged complex JavaScript for the DOM observer.
- Refined logic for dynamic keyword extraction.
- Helped translate technical concepts between the Python backend and the Chrome extension frontend.
Repository
Learning Outcomes
Natural Language Processing (NLP)
Handled the unique nuances of the Sinhala script and offensive context.
Full‑Stack Integration
Connected a JavaScript extension to a Python Flask API seamlessly.
Soft Skills
Learned the importance of “ethical AI”—designing tools that protect users without over‑censoring helpful content.
Unexpected Lesson
Real‑time web scraping and text blurring can significantly impact browser performance if not optimized.
Pros
- Gemini quickly suggested fixes for Flask routes and helped resolve CORS issues between the extension and the server.
- Acted like a senior developer, providing immediate, context‑aware guidance.
Cons
- When asking about specific Sinhala NLP libraries, Gemini sometimes suggested generic English‑centric models.
- Required highly specific prompts to ensure the AI understood the focus on a low‑resource language like Sinhala.