
arXiv:2605.22391v1 Announce Type: cross Abstract: We present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus. We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English, and normalise the raw ingredient strings to 1,790 canonical entries via an LLM-augmented pipeline. A 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph, 2,247 typed compound nodes across 15 categ
The proliferation of large language models and global data aggregation capabilities has enabled a new level of sophistication for analyzing complex, multilingual datasets like recipe corpuses.
This development indicates a growing use of AI to extract structured knowledge from unstructured culinary data, potentially revolutionizing food science, personalized nutrition, and the food industry.
We now have advanced, multilingual ingredient embeddings, providing a geometric representation of food components that can unlock new insights into flavor pairings, culinary traditions, and supply chain analysis.
- · Food tech companies
- · AI researchers in NLP
- · Culinary R&D
- · Personalized nutrition platforms
- · Traditional recipe analysis methods
- · Ingredient data silos
Food and beverage companies gain a powerful tool for innovation, enabling data-driven product development and optimization.
The ability to map ingredient relationships across cultures could lead to novel global culinary fusions and cross-cultural food product development.
Personalized dietary recommendations could become highly sophisticated, integrating individual health needs with global ingredient knowledge to prevent disease and optimize well-being.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL