How I Built a Recipe Extractor from YouTube Using AWS Transcribe

Published: 3 days ago (March 4, 2026 at 04:18 AM EST)

3 min read

Source: Dev.to

Cover image for How I Built a Recipe Extractor from YouTube Using AWS Transcribe

“Cooking videos are great, but following along in the kitchen is a pain. You’re elbow‑deep in dough and suddenly need to rewind for that one ingredient you missed.”

I built a small pipeline that takes any YouTube cooking video, pulls the audio, sends it to Amazon Transcribe, and gives me a clean text file of the entire recipe. No paid tools, no complex setup—just AWS services and a few Python scripts.

What the Pipeline Does

YouTube Video
   ↓
Download Audio (yt-dlp)
   ↓
Upload to S3
   ↓
Amazon Transcribe
   ↓
recipe.txt

Four steps. That’s it.

Step 1 — Download the Audio

I used yt-dlp to pull just the audio from the video. No need to download the full video.

yt-dlp \
    --extract-audio \
    --audio-quality 0 \
    --output "output/audio.%(ext)s" \
    "https://youtu.be/YOUR_VIDEO_ID"

One thing I ran into — ffmpeg was not installed on my machine, so the MP3 conversion failed. Amazon Transcribe supports webm format natively, so I skipped the conversion entirely and uploaded the raw .webm file, saving time.

Step 2 — Create an S3 Bucket and Upload

BUCKET_NAME="recipe-transcribe-$(date +%s)"

aws s3 mb s3://$BUCKET_NAME --region us-east-1
aws s3 cp output/audio.webm s3://$BUCKET_NAME/audio.webm

Using date +%s as a suffix keeps the bucket name unique without any extra thinking.

Step 3 — Start the Transcribe Job

import boto3

BUCKET_NAME = "your-bucket-name"
JOB_NAME    = "recipe-job-01"
REGION      = "us-east-1"
MEDIA_URI   = f"s3://{BUCKET_NAME}/audio.webm"

client = boto3.client("transcribe", region_name=REGION)

client.start_transcription_job(
    TranscriptionJobName=JOB_NAME,
    Media={"MediaFileUri": MEDIA_URI},
    MediaFormat="webm",
    LanguageCode="en-US",
    OutputBucketName=BUCKET_NAME,
    OutputKey="transcript.json",
)

Amazon Transcribe picks up the file from S3 and writes transcript.json back to the same bucket once done.

Step 4 — Poll the Job and Save the Recipe

import time, json, boto3

transcribe = boto3.client("transcribe", region_name=REGION)
s3 = boto3.client("s3", region_name=REGION)

while True:
    response = transcribe.get_transcription_job(TranscriptionJobName=JOB_NAME)
    status = response["TranscriptionJob"]["TranscriptionJobStatus"]
    print(f"Status: {status}")

    if status == "COMPLETED":
        break
    if status == "FAILED":
        raise RuntimeError("Job failed")

    time.sleep(15)

# Download and extract plain text
s3.download_file(BUCKET_NAME, "transcript.json", "output/transcript.json")

with open("output/transcript.json") as f:
    data = json.load(f)

text = data["results"]["transcripts"][0]["transcript"]

with open("output/recipe.txt", "w") as f:
    f.write(text)

The script checks every 15 seconds. For a 10‑minute video, the job finished in about a minute.

The Output

Here’s what came out for a Guntur Chicken Masala video:

Readable. Accurate. Ready to use in the kitchen.

IAM Permissions You Need

{
  "Effect": "Allow",
  "Action": [
    "s3:CreateBucket",
    "s3:PutObject",
    "s3:GetObject",
    "transcribe:StartTranscriptionJob",
    "transcribe:GetTranscriptionJob"
  ],
  "Resource": "*"
}

What I’d Build Next

Trigger the whole pipeline on S3 upload via Lambda
Process a full YouTube playlist at once
Add speaker labels for videos with multiple hosts

The full code is on GitHub:

How I Built a Recipe Extractor from YouTube Using AWS Transcribe

What the Pipeline Does

Step 1 — Download the Audio

Step 2 — Create an S3 Bucket and Upload

Step 3 — Start the Transcribe Job

Step 4 — Poll the Job and Save the Recipe

The Output

IAM Permissions You Need

What I’d Build Next

Related posts

How I Built an AI Image Generation Platform That Reached 48K+ Users

The missing layer between your AI agents and you

The secret isn't the model. It's the harness.

Why I'm Leaving My Comfort Zone: Staff Engineer AI first Engineer