Stop Writing Regex for Data You Should Be Describing in English

Published: (February 10, 2026 at 06:08 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

Remote‑Friendly, Senior‑Level, Salary‑Disclosed Job Screening

You have a spreadsheet of job postings and need to filter it down to roles that are remote‑friendly, senior‑level, and have a disclosed salary. The data looks like this:

companypost
AirtableAsync‑first team, 8+ yrs exp, $185‑220K base
VercelLead our NYC team. Competitive comp, DOE
NotionIn‑office SF. Staff eng, $200K + equity
LinearBootcamp grads welcome! $85K, remote‑friendly
DescriptWork from anywhere. Principal architect, $250K

Deterministic rules (in plain English)

  • Remote‑friendly – contains “remote”, “work from anywhere”, “async‑first”, or implied by the absence of an office location.
  • Senior‑level – contains “8+ yrs”, “Staff”, “Principal”, or “Lead” (note: “Lead” can sometimes be junior).
  • Salary disclosed – contains an actual number (e.g., “$85K”, “$185‑220K”), not “Competitive comp” or “DOE”.

Using everyrow to express the logic in natural language

everyrow lets you define fuzzy, qualitative logic in plain English and apply it to every row of a dataframe. The SDK handles LLM orchestration, structured outputs, and scaling.

import asyncio
import pandas as pd
from pydantic import BaseModel, Field
from everyrow.ops import screen

jobs = pd.DataFrame([
    {"company": "Airtable", "post": "Async-first team, 8+ yrs exp, $185-220K base"},
    {"company": "Vercel",   "post": "Lead our NYC team. Competitive comp, DOE"},
    {"company": "Notion",   "post": "In-office SF. Staff eng, $200K + equity"},
    {"company": "Linear",   "post": "Bootcamp grads welcome! $85K, remote-friendly"},
    {"company": "Descript", "post": "Work from anywhere. Principal architect, $250K"},
])

class JobScreenResult(BaseModel):
    qualifies: bool = Field(description="True if meets ALL criteria")

async def main():
    result = await screen(
        task="""
        Qualifies if ALL THREE are met:
        1. Remote‑friendly
        2. Senior‑level (5+ yrs exp OR Senior/Staff/Principal in title)
        3. Salary disclosed (specific numbers, not "competitive" or "DOE")
        """,
        input=jobs,
        response_model=JobScreenResult,
    )
    print(result.data)

asyncio.run(main())

Result

companyqualifies
AirtableTrue
VercelFalse
NotionFalse
LinearFalse
DescriptTrue
  • Airtable qualifies: “async‑first” (remote‑friendly), “8+ years” (senior), “$185‑220K” (salary disclosed).
  • Descript qualifies: “work from anywhere” (remote), “principal architect” (senior), “$250K” (salary disclosed).

The other rows fail at least one criterion (no real salary, in‑office location, or not senior enough).


Sessions: Track Everything in a Dashboard

Every operation runs within a grouping of related operations that appears in the everyrow.io web UI. Sessions are created automatically, but for multi‑step pipelines you’ll want to create one explicitly:

from everyrow import create_session
from everyrow.ops import screen, rank

async with create_session(name="Lead Qualification") as session:
    print(f"View at: {session.get_url()}")

    screened = await screen(
        session=session,
        task="Has a company email domain (not gmail, yahoo, etc.)",
        input=leads,
        response_model=ScreenResult,
    )

    ranked = await rank(
        session=session,
        task="Score by likelihood to convert",
        input=screened.data,
        field_name="conversion_score",
    )

The session URL gives you a live dashboard where you can monitor progress and inspect results while your script runs.


Background Jobs for Large Datasets

All the operations above are already async/await. The _async variants are fire‑and‑forget: they submit work to the server and return immediately so your script can continue.

from everyrow.ops import screen_async

async with create_session(name="Background Screening") as session:
    task = await screen_async(
        session=session,
        task="Remote‑friendly, senior‑level, salary disclosed",
        input=large_dataframe,
    )
    print(f"Task ID: {task.task_id}")
    # do other work...
    result = await task.await_result()

If your script crashes, recover the result later using the task ID:

from everyrow import fetch_task_data

df = await fetch_task_data("12345678-1234-1234-1234-123456789abc")

Beyond Screening: Other Operations

OperationWhat it does
ScreenFilter rows by criteria that require judgment
RankScore rows by qualitative factors
DedupeDeduplicate when fuzzy string matching isn’t enough
MergeJoin tables when keys don’t match exactly
ResearchRun web agents to research each row

Each operation takes a natural‑language task description and a dataframe, and returns structured results. Same pattern, different capability.


When to Use (and When Not To)

everyrow shines for cases where the logic is easy to describe but hard to code: screening, ranking, deduplication, and enrichment tasks where the criteria require judgment, world knowledge, or fuzzy matching.

It is not a replacement for deterministic transformations. If you can write a reliable pandas filter like df[df["salary"] > 100_000], you should. Use everyrow for columns that contain natural‑language, inconsistent, or otherwise ambiguous values.

Trade‑offs: LLM‑based operations introduce latency and cost. Use them judiciously for the parts of your pipeline that truly need human‑like reasoning.

Scaling note – In the job‑screening example above, processing 5 rows takes a few seconds and costs a fraction of a cent. For 10 000 rows you’ll want the async variants and should expect minutes rather than milliseconds. The Getting Started docs cover scaling patterns for larger datasets.


Get Started

pip install everyrow
export EVERYROW_API_KEY=your_key_here

Get a free API key at everyrow.io/api-key – it comes with $20 free credit.

Full docs and more examples: everyrow.io/docs/getting-started


Resources

0 views
Back to Blog

Related posts

Read more »

New article

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as we...