XiaChuFang Recipe Corpus

About

The recipes are from 下厨房 (XiaChuFang), a popular Chinese recipe sharing website. The corpus is constructed with recipes published before 2020 December.

We provide both the finetuning recipe corpus for training models on our task, and the full recipe corpus for further research on Chinese recipes.

Full Recipe Corpus

The full recipe corpus contains 1,520,327 Chinese recipes. Among them, 1,242,206 recipes belong to 30,060 dishes. A dish has 41.3 recipes on average.

The average length of a recipe is 224 characters. The maximum length is 62,722 characters, and the minimum length is 10 characters.

The recipes are contributed by 415,272 authors. Among them, the most productive author uploads 5,394 recipes. We provide the desensitized author information.

Example🍓🍤

{
  'name': '酸甜鲜美,草莓虾仁',  # the dish name written by the author
  'dish': '草莓虾仁',  # the dish mapped to the dish list provided by XiaChuFang (if none is mapped, the value is 'Unknown')
  'description': '草莓和虾仁放在一起,刚开始就为了好看,后来竟然发现味道也很棒~',  # the recipe description provided by the author
  'recipeIngredient':  # the ingredient list written by the author
    ['100克草莓', '300克对虾', '15克青椒', '1个鸡蛋', '8克盐', '3克胡椒粉', '10克料酒', '10克玉米淀粉', '10克生姜', '1000ml(消耗量25ml)色拉油'],
  'recipeInstructions': [  # the recipe steps
    '草莓一分为四切块', 
    '将对虾洗净去头去壳', 
    '虾仁开背去除虾线', 
    '虾仁用盐、味精、胡椒粉、鸡蛋清、湿淀粉腌制上浆入味', 
    '净锅倒入色拉油加热至三成热', 
    '虾仁入锅滑熟倒出沥油',
    '原锅留底油加适量清水,加入少许盐、味精、料酒,烧开后用少许湿淀粉勾玻璃芡汁,然后倒入虾仁、草莓翻拌均匀淋上少许明油即可出锅'
  ], 
  'author': 'author_2894',  # the desensitized author name
  'keywords':  # keywords to search the recipe provided by XiaChuFang
    ['草莓虾仁的做法', '草莓虾仁的家常做法', '草莓虾仁的详细做法', '草莓虾仁怎么做', '草莓虾仁的最正宗做法', '健康', '养生', '季节', '家宴']
}
            

Finetuning Recipe Corpus

The finetuning recipe corpus contains 1,479,764 recipes.

As this corpus is used for finetuning in our task, it does not contain recipes related to the dishes involved in evaluation.

The data format is the same as the full corpus.

Download the Finetuning Recipe Corpus