Build in Public: Week 6. Trying to Add More Social Platforms
Source: Dev.to
Introduction
Last week was about observability. We added metrics and dashboards so we could see what the system was actually doing instead of relying on intuition.
So this week wasn’t about inventing something new from scratch. It was about answering a very practical question: can we extend the same idea to other social platforms without the whole system falling apart?
Short answer: partially.
TikTok: Similar Problem, Different Shape
Quick reminder: Wykra is built to answer a very human question: “can you find me creators like this?” You describe the influencer you need and we go look for them. We already do this for Instagram. This week we tried to reuse the same pattern for other platforms, starting with TikTok.
At a high level TikTok search follows the same story arc as Instagram:
- You send a free‑text request, e.g.
Find up to 15 public TikTok creators from Portugal who post about baking or sourdough bread. - The API immediately returns a task ID while the real work happens in the background.
- A worker picks up that task, turns your sentence into structured search parameters, runs a Bright Data dataset scrapper, scores the discovered profiles with an LLM, filters out the useless ones, and finally stores everything so you can fetch the results from
/tasks/:idlater.
Step 1 – “vibe in, JSON out”
We send the original query to an LLM and ask it to extract a small context object:
- niche / topic
- location (with a normalized country code)
- optional target number of creators
- a few short phrases that could go straight into the TikTok search box
If the model cannot even agree on a category, we stop there instead of pretending we know what to search for. Once the context is ready, we build up to three search terms, pick a country (either from the context or defaulting to US), and move on.
Step 2 – Search diverges from Instagram
For Instagram we have to use Perplexity to discover profiles first and only then enrich them. TikTok, thanks to a proper keyword search in the dataset, lets us skip that extra step.
For each search term we:
- Generate a TikTok search URL.
- Trigger the Bright Data TikTok dataset with that URL and country.
- Poll until the snapshot is ready, download the JSON, and then merge & deduplicate all profiles by their profile URL.
The whole thing can take a while, so it lives as a long‑running async job inside the same generic Task system we already use elsewhere.
Step 3 – LLM scoring
Once we have the raw profiles, the LLM comes back in. For each profile we extract the basics (handle, profile URL, follower count, privacy flag, bio), send them together with the original query to the model, and ask for:
- a short summary
- a quality score (1 – 5)
- a relevance percentage
Anything below 70 % relevance is dropped; everything above is saved with its summary and score and linked to the task. The platform is different, but the pattern stays the same: structured context → Bright Data → LLM scoring.
Example of a real request and an answer
(omitted for brevity – the original content contained a concrete request/response pair)
Tasks, Metrics, and a Bit of Discipline
All of this runs as a long‑running background job attached to a single task ID. The task goes through a simple lifecycle:
pending → running → completed | failed
We store the task record and all TikTok profiles linked to it. When you fetch /tasks/:id, you see both the raw task status and the list of analyzed profiles. This turned out to be surprisingly helpful for debugging: if TikTok is empty but the task is completed, the problem is probably on the crawling or analysis side, not the queue.
Because we added observability last week, almost every step is also wrapped in metrics:
- number of TikTok search tasks created, completed, or failed
- queue latency (how long tasks sit in the queue)
- duration of Bright Data calls and their error rate
- number of LLM calls and their cost
YouTube: The Half‑Hour Spinner of Doom
TikTok was the success story this week. YouTube reminded us that not everything is ready to be wired into Wykra, no matter how clean the architecture looks on paper.
We tried plugging in the YouTube dataset with a very gentle test:
{
"url": "https://www.youtube.com/results?search_query=sourdough+bread+new+york+",
"country": "US",
"transcription_language": ""
}
In theory, this should behave a lot like TikTok: trigger crawl, wait, download JSON, move on with life. In practice, after ~30 minutes of spinning, the only thing we got back was:
{
"error": "Crawler error: Unexpected token '(', \"(function \"... is not valid JSON",
"error_code": "crawl_error"
}
So for now YouTube isn’t really plugged into Wykra: the dataset just spins, throws a crawler JSON error, and gives us nothing useful to store or analyze. We’ve opened a ticket with Bright Data and postponed YouTube until that’s sorted.
Threads: Parameter Present, Logic Absent
Threads got its own attempt too. The plan was simple: run a basic keyword‑based discovery, something like:
{ "keyword": "technology" }
Instead of profiles, we got back:
{
"error": "Parse error: Cannot read properties of null (reading 'require')",
"error_code": "parse_error"
}
So the keyword parameter exists, the dataset exists, but the bit in the middle that’s supposed to connect them clearly doesn’t. For now we’re treating Threads the same way as YouTube: noted the issue and moved “proper Threads support” into the later bucket.
LinkedIn: Same Old Story
LinkedIn has a similar limitation to Instagram: there is no nice keyword search for “find me people who talk about X from country Y”. You c
Week 6 Summary
The new platforms — TikTok, YouTube, Threads, and LinkedIn — behave differently from what we’ve seen on Instagram.
- TikTok: mostly follows the existing pattern.
- YouTube & Threads: the pattern breaks.
- LinkedIn: requires its own keyword‑driven search layer (e.g., a Perplexity/LLM‑style overlay) rather than relying solely on the dataset.
“If we want proper keyword‑driven discovery, we’ll probably have to plug in a Perplexity/LLM‑style search layer on top of LinkedIn as well, not just rely on the dataset.”
That’s a problem for another week, but now it’s a clearly defined one—not a vague feeling that “LinkedIn is weird”.
Conclusion:
For now, it’s better to have a couple of solid flows than five half‑broken ones.
If you’d like to support the project, ⭐️ star the repo and follow me on X—it really helps.
- Repo:
- Twitter/X: