Stop Writing Regex for Data You Should Be Describing in English

Published: 3 days ago (February 10, 2026 at 06:08 AM EST)

5 min read

Source: Dev.to

Remote‑Friendly, Senior‑Level, Salary‑Disclosed Job Screening

You have a spreadsheet of job postings and need to filter it down to roles that are remote‑friendly, senior‑level, and have a disclosed salary. The data looks like this:

company	post
Airtable	Async‑first team, 8+ yrs exp, $185‑220K base
Vercel	Lead our NYC team. Competitive comp, DOE
Notion	In‑office SF. Staff eng, $200K + equity
Linear	Bootcamp grads welcome! $85K, remote‑friendly
Descript	Work from anywhere. Principal architect, $250K

Deterministic rules (in plain English)

Remote‑friendly – contains “remote”, “work from anywhere”, “async‑first”, or implied by the absence of an office location.
Senior‑level – contains “8+ yrs”, “Staff”, “Principal”, or “Lead” (note: “Lead” can sometimes be junior).
Salary disclosed – contains an actual number (e.g., “$85K”, “$185‑220K”), not “Competitive comp” or “DOE”.

Using everyrow to express the logic in natural language

everyrow lets you define fuzzy, qualitative logic in plain English and apply it to every row of a dataframe. The SDK handles LLM orchestration, structured outputs, and scaling.

import asyncio
import pandas as pd
from pydantic import BaseModel, Field
from everyrow.ops import screen

jobs = pd.DataFrame([
    {"company": "Airtable", "post": "Async-first team, 8+ yrs exp, $185-220K base"},
    {"company": "Vercel",   "post": "Lead our NYC team. Competitive comp, DOE"},
    {"company": "Notion",   "post": "In-office SF. Staff eng, $200K + equity"},
    {"company": "Linear",   "post": "Bootcamp grads welcome! $85K, remote-friendly"},
    {"company": "Descript", "post": "Work from anywhere. Principal architect, $250K"},
])

class JobScreenResult(BaseModel):
    qualifies: bool = Field(description="True if meets ALL criteria")

async def main():
    result = await screen(
        task="""
        Qualifies if ALL THREE are met:
        1. Remote‑friendly
        2. Senior‑level (5+ yrs exp OR Senior/Staff/Principal in title)
        3. Salary disclosed (specific numbers, not "competitive" or "DOE")
        """,
        input=jobs,
        response_model=JobScreenResult,
    )
    print(result.data)

asyncio.run(main())

Result

company	qualifies
Airtable	True
Vercel	False
Notion	False
Linear	False
Descript	True

Airtable qualifies: “async‑first” (remote‑friendly), “8+ years” (senior), “$185‑220K” (salary disclosed).
Descript qualifies: “work from anywhere” (remote), “principal architect” (senior), “$250K” (salary disclosed).

The other rows fail at least one criterion (no real salary, in‑office location, or not senior enough).

Sessions: Track Everything in a Dashboard

Every operation runs within a grouping of related operations that appears in the everyrow.io web UI. Sessions are created automatically, but for multi‑step pipelines you’ll want to create one explicitly:

from everyrow import create_session
from everyrow.ops import screen, rank

async with create_session(name="Lead Qualification") as session:
    print(f"View at: {session.get_url()}")

    screened = await screen(
        session=session,
        task="Has a company email domain (not gmail, yahoo, etc.)",
        input=leads,
        response_model=ScreenResult,
    )

    ranked = await rank(
        session=session,
        task="Score by likelihood to convert",
        input=screened.data,
        field_name="conversion_score",
    )

The session URL gives you a live dashboard where you can monitor progress and inspect results while your script runs.

Background Jobs for Large Datasets

All the operations above are already async/await. The _async variants are fire‑and‑forget: they submit work to the server and return immediately so your script can continue.

from everyrow.ops import screen_async

async with create_session(name="Background Screening") as session:
    task = await screen_async(
        session=session,
        task="Remote‑friendly, senior‑level, salary disclosed",
        input=large_dataframe,
    )
    print(f"Task ID: {task.task_id}")
    # do other work...
    result = await task.await_result()

If your script crashes, recover the result later using the task ID:

from everyrow import fetch_task_data

df = await fetch_task_data("12345678-1234-1234-1234-123456789abc")

Beyond Screening: Other Operations

Operation	What it does
Screen	Filter rows by criteria that require judgment
Rank	Score rows by qualitative factors
Dedupe	Deduplicate when fuzzy string matching isn’t enough
Merge	Join tables when keys don’t match exactly
Research	Run web agents to research each row

Each operation takes a natural‑language task description and a dataframe, and returns structured results. Same pattern, different capability.

When to Use (and When Not To)

everyrow shines for cases where the logic is easy to describe but hard to code: screening, ranking, deduplication, and enrichment tasks where the criteria require judgment, world knowledge, or fuzzy matching.

It is not a replacement for deterministic transformations. If you can write a reliable pandas filter like df[df["salary"] > 100_000], you should. Use everyrow for columns that contain natural‑language, inconsistent, or otherwise ambiguous values.

Trade‑offs: LLM‑based operations introduce latency and cost. Use them judiciously for the parts of your pipeline that truly need human‑like reasoning.

Scaling note – In the job‑screening example above, processing 5 rows takes a few seconds and costs a fraction of a cent. For 10 000 rows you’ll want the async variants and should expect minutes rather than milliseconds. The Getting Started docs cover scaling patterns for larger datasets.

Get Started

pip install everyrow
export EVERYROW_API_KEY=your_key_here

Get a free API key at everyrow.io/api-key – it comes with $20 free credit.

Full docs and more examples: everyrow.io/docs/getting-started

Stop Writing Regex for Data You Should Be Describing in English

Remote‑Friendly, Senior‑Level, Salary‑Disclosed Job Screening

Deterministic rules (in plain English)

Using everyrow to express the logic in natural language

Result

Sessions: Track Everything in a Dashboard

Background Jobs for Large Datasets

Beyond Screening: Other Operations

When to Use (and When Not To)

Get Started

Resources

Related posts

How to Validate Email Addresses in Node.js (2026 Guide)

Next.js Weekly #117: vS3, TypeScript 6.0 Beta, Bulletproof Component, AI Debugging, Enterprise Next.js, State of React 2025

New article

Your Documentation is Lying to Your Users