All of human cooking compressed into 2 megabytes

Published: (May 27, 2026 at 04:14 AM EDT)
1 min read

Source: Hacker News

View PDF
HTML (experimental)

Abstract

We present Epicure, a family of three sibling skip‑gram ingredient embeddings retrained from scratch on a multilingual recipe corpus. We aggregate 4.14 M recipes from 11 sources spanning seven languages—English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian‑English—and normalise the raw ingredient strings to 1,790 canonical entries via an LLM‑augmented pipeline. A 203,508‑edge ingredient‑ingredient NPMI graph and an 80,019‑edge typed FlavorDB ingredient‑compound graph, 2,247 typed compound nodes across 15 categories, seed three Metapath2Vec variants that share architecture and hyperparameters and differ only in the random‑walk schema: Cooc walks the co‑occurrence graph only, Chem walks the typed compound metapaths only, and Core blends both via injected ingredient‑ingredient walks at controlled mixing, placing each model at a distinct point on the chemistry‑vs‑recipe‑context spectrum.

Subjects

  • Artificial Intelligence (cs.AI)
  • Computation and Language (cs.CL)
  • Computers and Society (cs.CY)

Citation

Cite as: arXiv:2605.22391 (cs.AI)

or

arXiv:2605.22391v1 (cs.AI) for this version.

DOI

https://doi.org/10.48550/arXiv.2605.22391
arXiv‑issued DOI via DataCite (pending registration)

Submission history

From: Josef Liyanjun Chen
Version: v1
Date: Thu, 21 May 2026 12:23:38 UTC (6,566 KB)

0 views
Back to Blog

Related posts

Read more »