I Built a Real-Time HackerNews Trend Radar With AI (And It Runs Itself)
Source: Dev.to
Every day, HackerNews quietly decides what the dev world will care about next.
But unless you’re doom‑scrolling it all day, you’re missing the real signal: which topics are actually taking off right now, across threads and deep comment chains.
So instead of manually refreshing HN, I built a real‑time “trend radar” on top of it:
-
Continuously ingests fresh HN stories and comments
-
Uses an LLM to extract structured topics (companies, tools, models, tech terms)
-
Streams everything into Postgres for instant querying like:
- “What’s trending on HN right now?”
- “Which threads are driving the most hype for Claude / LangChain / Rust today?”
All of this runs as a declarative CocoIndex flow with incremental syncs, LLM‑powered extraction, and simple query handlers.
In this post, you’ll see how it works end‑to‑end and how you can fork it to track any community (Reddit, X, Discord, internal Slack, etc.).
Why HN Is a Goldmine (If You Can Structure It)
HackerNews is one of the strongest early signals for:
- New tools and frameworks devs actually try
- Which AI models/products are gaining mindshare
- Real sentiment and feedback in the comments
- Emerging startups and obscure libraries that might be big in 6‑12 months
But raw HN has three problems:
- Threads are noisy; comments are nested and messy
- There’s no notion of “topics” beyond free text
- There’s no built‑in way to ask: “What’s trending across the whole firehose?”
The HackerNews Trending Topics example in CocoIndex is essentially: “turn HN into a structured, continuously updating topics index that AI agents and dashboards can query in milliseconds.”
Architecture: From HN Firehose to Queryable Topics
At a high level, the pipeline looks like this:
HackerNews API
↓
HackerNewsConnector (Custom Source)
├─ list() → thread IDs + updated_at
├─ get_value() → full threads + comments
└─ provides_ordinal() → enables incremental sync
↓
CocoIndex Flow
├─ LLM topic extraction on threads + comments
├─ message_index collector (content)
└─ topic_index collector (topics)
↓
Postgres
├─ hn_messages
└─ hn_topics
↓
Query Handlers
├─ search_by_topic("Claude")
├─ get_trending_topics(limit=20)
└─ get_threads_for_topic("Rust")
Key idea: separate discovery from fetching.
list()hits the HN Algolia search API to get lightweight metadata: thread IDs +updated_attimestamps.get_value()only runs for threads whoseupdated_atchanged, fetching full content + comments from the items API.- Ordinals (timestamps) let CocoIndex skip everything that hasn’t changed, cutting API calls by > 90 % on subsequent syncs.
This is what enables “live mode” with a 30‑second polling interval without melting APIs or your wallet.
Step 1: Turning HackerNews Into a First‑Class Incremental Source
First, define the data model for threads and comments.
class _HackerNewsThreadKey(NamedTuple):
thread_id: str
@dataclasses.dataclass
class _HackerNewsComment:
id: str
author: str | None
text: str | None
created_at: datetime | None
@dataclasses.dataclass
class _HackerNewsThread:
author: str | None
text: str
url: str | None
created_at: datetime | None
comments: list[_HackerNewsComment]
Then declare a SourceSpec that configures how to query HN:
class HackerNewsSource(SourceSpec):
"""Source spec for HackerNews API."""
tag: str | None = None # e.g. "story"
max_results: int = 100 # hits per poll
The custom source connector wires this spec into actual HTTP calls:
list()→ callshttps://hn.algolia.com/api/v1/search_by_datewithhitsPerPage=max_results, yieldsPartialSourceRowobjects keyed by thread ID, with ordinals based onupdated_at.get_value()→ callshttps://hn.algolia.com/api/v1/items/{thread_id}and parses the full thread + nested comments into_HackerNewsThreadand_HackerNewsComment.provides_ordinal()→ returnsTrueso CocoIndex can do incremental sync.
CocoIndex handles the hard part: tracking ordinals and only re‑pulling changed rows on each sync.
Step 2: Using an LLM to Extract Topics From Every Thread and Comment
Once the source is in the flow, the fun part starts: semantic enrichment.
Define a minimal Topic type that the LLM will fill:
@dataclasses.dataclass
class Topic:
"""
A single topic extracted from text:
- products, tools, frameworks
- people, companies
- domains (e.g. "vector search", "fintech")
"""
topic: str
Inside the flow, every thread gets its topics extracted with a single declarative transform:
with data_scope["threads"].row() as thread:
thread["topics"] = thread["text"].transform(
cocoindex.functions.ExtractByLlm(
llm_spec=cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.OPENAI,
model="gpt-4o-mini",
),
output_type=list[Topic],
)
)
Do the same for comments:
with thread["comments"].row() as comment:
comment["topics"] = comment["text"].transform(
cocoindex.functions.ExtractByLlm(
llm_spec=cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.OPENAI,
model="gpt-4o-mini",
),
output_type=list[Topic],
)
)
Under the hood, CocoIndex:
- Calls the LLM with a structured prompt and enforces
output_type=list[Topic] - Normalizes messy free text into consistent topic strings
- Makes this just another column in your flow instance, ready to be persisted to Postgres and queried instantly.
Instead of a Separate Glue Script
This is what turns HN from “some text” into something an AI agent or SQL query can reason about.
Step 3: Indexing Into Postgres for Fast Topic Queries
All structured data is collected into two logical indexes:
| Index | Contents |
|---|---|
message_index | Threads + comments with their raw text and metadata |
topic_index | Individual topics linked back to messages |
Collectors are declared once and then exported to Postgres:
message_index = data_scope.add_collector()
topic_index = data_scope.add_collector()
message_index.export(
"hn_messages",
cocoindex.targets.Postgres(),
primary_key_fields=["id"],
)
topic_index.export(
"hn_topics",
cocoindex.targets.Postgres(),
primary_key_fields=["topic", "message_id"],
)
Now you have two tables you can poke with SQL or via CocoIndex query handlers:
hn_messages– full‑text search, content analytics, author statshn_topics– topic‑level analytics, trend tracking, per‑topic thread ranking
Step 4: Query Handlers – From “Cool Pipeline” to Real Product
Here’s where the project stops being just an ETL demo and becomes something you can actually ship.
4.1 search_by_topic(topic): “Show Me All Claude Mentions”
This query handler lets you search HN content by topic across threads and comments:
@hackernews_trending_topics_flow.query_handler()
def search_by_topic(topic: str) -> cocoindex.QueryOutput:
topic_table = cocoindex.utils.get_target_default_name(
hackernews_trending_topics_flow, "hn_topics"
)
message_table = cocoindex.utils.get_target_default_name(
hackernews_trending_topics_flow, "hn_messages"
)
with connection_pool().connection() as conn:
with conn.cursor() as cur:
cur.execute(
f"""
SELECT m.id, m.thread_id, m.author, m.content_type,
m.text, m.created_at, t.topic
FROM {topic_table} t
JOIN {message_table} m ON t.message_id = m.id
WHERE LOWER(t.topic) LIKE LOWER(%s)
ORDER BY m.created_at DESC
""",
(f"%{topic}%",),
)
results = [
{
"id": row[0],
"url": f"https://news.ycombinator.com/item?id={row[1]}",
"author": row[2],
"type": row[3],
"text": row[4],
"created_at": row[5].isoformat(),
"topic": row[6],
}
for row in cur.fetchall()
]
return cocoindex.QueryOutput(results=results)
Run it from the CLI:
cocoindex query main.py search_by_topic --topic "Claude"
You’ll receive a clean JSON response with URLs, authors, timestamps, and the piece of content where the topic appeared.
4.2 get_threads_for_topic(topic): Rank Threads by Topic Score
Not all mentions are equal:
- If “Rust” appears in the thread title, that’s a primary discussion.
- If it’s buried in a comment, that’s a side mention.
get_threads_for_topic uses a weighted scoring model to prioritize threads where the topic is central.
4.3 get_trending_topics(limit=20): The Actual Trend Radar
This endpoint powers dashboards and agents. It surfaces a list such as:
[
"Claude 3.7 Sonnet",
"OpenAI o4-mini",
"LangChain",
"Modal",
…
]
Each topic includes its score, latest mention time, and the top threads where it’s being discussed.
You can wire this into:
- A live dashboard showing “top 20 topics in the last N hours”
- A Slack bot posting a daily “what’s trending on HN” summary
- An internal research agent that watches for signals relevant to your stack
Running It in Real Time
Once the flow is defined, keeping it live is a one‑liner:
# On‑demand refresh
cocoindex update main
# Live mode: keeps polling HN and updating indexes
cocoindex update -L main
CocoIndex handles:
- Polling HN every 30 seconds (configurable)
- Incrementally syncing only changed threads
- Re‑running LLM extraction only where needed
- Exporting into Postgres and exposing query handlers
For debugging, CocoInsight lets you explore the flow, see lineage, and play with queries from a UI:
cocoindex server -ci main
# Then open: https://cocoindex.io/cocoinsight
What You Can Build on Top of This (Beyond “Just HN”)
Once you have this pattern, you’re not limited to Hacker News.
| Extension | Idea |
|---|---|
| Cross‑community trend tracking | Add Reddit subs, X lists, Discord channels, internal Slack, etc. as additional sources; normalize topics across them to see where ideas propagate. |
| Sentiment‑aware trend analysis | Plug an LLM‑based sentiment extraction step alongside topics; track not just what is trending, but whether devs love or hate it. |
| Influencer & key‑contributor maps | Use the author field to see who starts important discussions and whose comments move the conversation. |
| Continuous knowledge graphs | Treat topics as nodes, threads as edges, and build a graph of tools, companies, and people linked by real discussions. |
| Real‑time AI research agents | Point an agent at the Postgres‑backed index and let it answer questions like: |
| • “What are the top new vector DBs people are experimenting with this week?” | |
| • “Which AI eval frameworks are getting traction?” |
If you live in data, infra, or AI‑land, this is basically a self‑updating signal layer over HN that your tools and agents can query.
Want to Try It Yourself?
You can find the fully working example (including the flow definition, Docker setup, and a quick‑start script) in the repository linked below.
GitHub – hackernews‑trending‑topics‑cocoindex
Happy hacking!
Overview
Explore the ding flow definition, custom source, query handlers, and Postgres export in the official HackerNews Trending Topics example on the CocoIndex docs and its GitHub repository.
If you end up
- Pointing this at a different community
- Layering in embeddings, RAG, or sentiment analysis
- Wiring it into a real product or agent
…definitely share it back!
The coolest part of this pattern is how little code you need to go from “raw community noise” to a live, queryable trend radar.