How I Built a Recipe Extractor from YouTube Using AWS Transcribe

Published: (March 4, 2026 at 04:18 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for How I Built a Recipe Extractor from YouTube Using AWS Transcribe

“Cooking videos are great, but following along in the kitchen is a pain. You’re elbow‑deep in dough and suddenly need to rewind for that one ingredient you missed.”

I built a small pipeline that takes any YouTube cooking video, pulls the audio, sends it to Amazon Transcribe, and gives me a clean text file of the entire recipe. No paid tools, no complex setup—just AWS services and a few Python scripts.

What the Pipeline Does

YouTube Video

Download Audio (yt-dlp)

Upload to S3

Amazon Transcribe

recipe.txt

Four steps. That’s it.

Step 1 — Download the Audio

I used yt-dlp to pull just the audio from the video. No need to download the full video.

yt-dlp \
    --extract-audio \
    --audio-quality 0 \
    --output "output/audio.%(ext)s" \
    "https://youtu.be/YOUR_VIDEO_ID"

One thing I ran into — ffmpeg was not installed on my machine, so the MP3 conversion failed. Amazon Transcribe supports webm format natively, so I skipped the conversion entirely and uploaded the raw .webm file, saving time.

Step 2 — Create an S3 Bucket and Upload

BUCKET_NAME="recipe-transcribe-$(date +%s)"

aws s3 mb s3://$BUCKET_NAME --region us-east-1
aws s3 cp output/audio.webm s3://$BUCKET_NAME/audio.webm

Using date +%s as a suffix keeps the bucket name unique without any extra thinking.

Step 3 — Start the Transcribe Job

import boto3

BUCKET_NAME = "your-bucket-name"
JOB_NAME    = "recipe-job-01"
REGION      = "us-east-1"
MEDIA_URI   = f"s3://{BUCKET_NAME}/audio.webm"

client = boto3.client("transcribe", region_name=REGION)

client.start_transcription_job(
    TranscriptionJobName=JOB_NAME,
    Media={"MediaFileUri": MEDIA_URI},
    MediaFormat="webm",
    LanguageCode="en-US",
    OutputBucketName=BUCKET_NAME,
    OutputKey="transcript.json",
)

Amazon Transcribe picks up the file from S3 and writes transcript.json back to the same bucket once done.

Step 4 — Poll the Job and Save the Recipe

import time, json, boto3

transcribe = boto3.client("transcribe", region_name=REGION)
s3 = boto3.client("s3", region_name=REGION)

while True:
    response = transcribe.get_transcription_job(TranscriptionJobName=JOB_NAME)
    status = response["TranscriptionJob"]["TranscriptionJobStatus"]
    print(f"Status: {status}")

    if status == "COMPLETED":
        break
    if status == "FAILED":
        raise RuntimeError("Job failed")

    time.sleep(15)

# Download and extract plain text
s3.download_file(BUCKET_NAME, "transcript.json", "output/transcript.json")

with open("output/transcript.json") as f:
    data = json.load(f)

text = data["results"]["transcripts"][0]["transcript"]

with open("output/recipe.txt", "w") as f:
    f.write(text)

The script checks every 15 seconds. For a 10‑minute video, the job finished in about a minute.

The Output

Here’s what came out for a Guntur Chicken Masala video:

Recipe transcript output

Readable. Accurate. Ready to use in the kitchen.

IAM Permissions You Need

{
  "Effect": "Allow",
  "Action": [
    "s3:CreateBucket",
    "s3:PutObject",
    "s3:GetObject",
    "transcribe:StartTranscriptionJob",
    "transcribe:GetTranscriptionJob"
  ],
  "Resource": "*"
}

What I’d Build Next

  • Trigger the whole pipeline on S3 upload via Lambda
  • Process a full YouTube playlist at once
  • Add speaker labels for videos with multiple hosts

The full code is on GitHub:

0 views
Back to Blog

Related posts

Read more »