영어로 설명해야 할 데이터에 대해 Regex 작성을 멈추세요

발행: 2개월 전 (2026년 2월 10일 오후 08:08 GMT+9)

7 분 소요

원문: Dev.to

Source: Dev.to

원격 친화적, 시니어 레벨, 급여 공개 채용 스크리닝

채용 공고가 담긴 스프레드시트가 있으며, 원격 친화적, 시니어 레벨, 그리고 급여가 공개된 역할만 필터링해야 합니다. 데이터는 다음과 같습니다:

company	post
Airtable	Async‑first 팀, 8년 이상 경력, $185‑220K 기본 연봉
Vercel	우리 NYC 팀을 이끌다. 경쟁 보상, DOE
Notion	SF 사무실 근무. Staff 엔지니어, $200K + 주식
Linear	부트캠프 졸업생 환영! $85K, 원격 친화적
Descript	어디서든 근무 가능. 수석 아키텍트, $250K

결정적 규칙 (일반 영어)

원격 친화적 – “remote”, “work from anywhere”, “async‑first”와 같은 문구가 포함되거나 사무실 위치가 명시되지 않아 암시적으로 원격 근무가 가능함을 의미합니다.
시니어 레벨 – “8+ yrs”, “Staff”, “Principal”, 또는 “Lead”(단, “Lead”는 경우에 따라 주니어일 수도 있음)와 같은 문구가 포함됩니다.
급여 공개 – 실제 금액이 포함된 경우(예: “$85K”, “$185‑220K”), “Competitive comp” 또는 “DOE”와 같은 표현은 포함되지 않습니다.

everyrow를 사용해 자연어로 논리 표현하기

everyrow는 일상 영어로 퍼지하고 정성적인 논리를 정의하고 데이터프레임의 각 행에 적용할 수 있게 해줍니다. SDK는 LLM 오케스트레이션, 구조화된 출력, 스케일링을 처리합니다.

import asyncio
import pandas as pd
from pydantic import BaseModel, Field
from everyrow.ops import screen

jobs = pd.DataFrame([
    {"company": "Airtable", "post": "Async-first team, 8+ yrs exp, $185-220K base"},
    {"company": "Vercel",   "post": "Lead our NYC team. Competitive comp, DOE"},
    {"company": "Notion",   "post": "In-office SF. Staff eng, $200K + equity"},
    {"company": "Linear",   "post": "Bootcamp grads welcome! $85K, remote-friendly"},
    {"company": "Descript", "post": "Work from anywhere. Principal architect, $250K"},
])

class JobScreenResult(BaseModel):
    qualifies: bool = Field(description="True if meets ALL criteria")

async def main():
    result = await screen(
        task="""
        Qualifies if ALL THREE are met:
        1. Remote‑friendly
        2. Senior‑level (5+ yrs exp OR Senior/Staff/Principal in title)
        3. Salary disclosed (specific numbers, not "competitive" or "DOE")
        """,
        input=jobs,
        response_model=JobScreenResult,
    )
    print(result.data)

asyncio.run(main())

결과

company	qualifies
Airtable	True
Vercel	False
Notion	False
Linear	False
Descript	True

Airtable는 다음 이유로 자격이 있습니다: “async‑first”(원격 친화적), “8+ years”(시니어), “$185‑220K”(급여 공개).
Descript는 다음 이유로 자격이 있습니다: “work from anywhere”(원격), “principal architect”(시니어), “$250K”(급여 공개).

다른 행들은 최소 하나의 기준을 충족하지 못합니다(실제 급여가 없거나, 사무실 근무, 혹은 시니어 수준이 충분하지 않음).

세션: 대시보드에서 모든 것을 추적하기

모든 작업은 everyrow.io 웹 UI에 표시되는 관련 작업들의 그룹 내에서 실행됩니다. 세션은 자동으로 생성되지만, 다단계 파이프라인의 경우 명시적으로 세션을 생성하는 것이 좋습니다:

from everyrow import create_session
from everyrow.ops import screen, rank

async with create_session(name="Lead Qualification") as session:
    print(f"View at: {session.get_url()}")

    screened = await screen(
        session=session,
        task="Has a company email domain (not gmail, yahoo, etc.)",
        input=leads,
        response_model=ScreenResult,
    )

    ranked = await rank(
        session=session,
        task="Score by likelihood to convert",
        input=screened.data,
        field_name="conversion_score",
    )

세션 URL은 스크립트가 실행되는 동안 진행 상황을 모니터링하고 결과를 검사할 수 있는 실시간 대시보드를 제공합니다.

대규모 데이터셋을 위한 백그라운드 작업

위의 모든 작업은 이미 async/await 형태입니다. _async 변형은 fire‑and‑forget 방식으로, 작업을 서버에 제출하고 즉시 반환하여 스크립트가 계속 실행될 수 있게 합니다.

from everyrow.ops import screen_async

async with create_session(name="Background Screening") as session:
    task = await screen_async(
        session=session,
        task="Remote‑friendly, senior‑level, salary disclosed",
        input=large_dataframe,
    )
    print(f"Task ID: {task.task_id}")
    # do other work...
    result = await task.await_result()

스크립트가 중단되더라도 작업 ID를 사용해 나중에 결과를 복구할 수 있습니다:

from everyrow import fetch_task_data

df = await fetch_task_data("12345678-1234-1234-1234-123456789abc")

스크리닝을 넘어: 기타 작업

Operation	What it does
Screen	판단이 필요한 기준으로 행을 필터링합니다
Rank	정성적 요소로 행에 점수를 매깁니다
Dedupe	퍼지 문자열 매칭만으로는 충분하지 않을 때 중복을 제거합니다
Merge	키가 정확히 일치하지 않을 때 테이블을 조인합니다
Research	각 행을 조사하기 위해 웹 에이전트를 실행합니다

각 작업은 자연어 작업 설명과 데이터프레임을 입력으로 받아 구조화된 결과를 반환합니다. 동일한 패턴이지만, 기능은 다릅니다.

When to Use (and When Not To)

everyrow shines for cases where the logic is easy to describe but hard to code: screening, ranking, deduplication, and enrichment tasks where the criteria require judgment, world knowledge, or fuzzy matching.

It is not a replacement for deterministic transformations. If you can write a reliable pandas filter like df[df["salary"] > 100_000], you should. Use everyrow for columns that contain natural‑language, inconsistent, or otherwise ambiguous values.

Trade‑offs: LLM‑based operations introduce latency and cost. Use them judiciously for the parts of your pipeline that truly need human‑like reasoning.

Scaling note – In the job‑screening example above, processing 5 rows takes a few seconds and costs a fraction of a cent. For 10 000 rows you’ll want the async variants and should expect minutes rather than milliseconds. The Getting Started docs cover scaling patterns for larger datasets.

시작하기

pip install everyrow
export EVERYROW_API_KEY=your_key_here

무료 API 키는 everyrow.io/api-key에서 받을 수 있습니다 – $20 무료 크레딧이 제공됩니다.

전체 문서와 추가 예제: everyrow.io/docs/getting-started

영어로 설명해야 할 데이터에 대해 Regex 작성을 멈추세요

원격 친화적, 시니어 레벨, 급여 공개 채용 스크리닝

결정적 규칙 (일반 영어)

everyrow를 사용해 자연어로 논리 표현하기

결과

세션: 대시보드에서 모든 것을 추적하기

대규모 데이터셋을 위한 백그라운드 작업

스크리닝을 넘어: 기타 작업

When to Use (and When Not To)

시작하기

리소스

관련 글

Show HN: Journey – 맞춤형 2D ECS 게임 엔진, Rust와 WGPU로 작성

파트 3: Testing, Documentation & Deployment 🚀

보이지 않는 레이어: HTTP 캐싱 마스터링 (Part 2)

파트 2: dbt 프로젝트 구조 및 모델 구축 📁