Building on my work in the last post, I am continuing my mission to build a reliable database for retreiving nutrition information for commonly used ingredients. I will use this data in order to add a calculator to estimate nutrition facts for recipes in my recipe management app Root Cellar (Github).
The ComprehensiveFoodDatabase I cleaned up in the last post is derived from the USDA FoodData dataset. While it has a huge variety of branded foods, and a decent number of raw ingredients (albeit with odd names and unintuitive units) I was having a lot of trouble getting good matches for even the common ingredients in my recipes.
I really needed a data source for common raw ingredients used in cooking. I came across the Canadian Nutrient File dataset, so I decided to give that a try next.
Exploring the Canadian Nutrient Files
I started by downloading the files as CSV, and then used DB Browser for SQLite to import the CSV files into snake_case named tables. You can find the resulting SQLite DB here.
CREATE VIEW canfood AS
SELECT
FoodID, description, serving_size,
avg(NutrientValue) filter(where NutrientName = "ENERGY (KILOCALORIES)") calories,
avg(NutrientValue) filter(where NutrientName = "FAT (TOTAL LIPIDS)") fat,
avg(NutrientValue) filter(where NutrientName = "CARBOHYDRATE, TOTAL (BY DIFFERENCE)") carbs,
avg(NutrientValue) filter(where NutrientName = "PROTEIN") protein,
avg(NutrientValue) filter(where NutrientName = "FIBRE, TOTAL DIETARY") fiber,
avg(NutrientValue) filter(where NutrientName = "SUGARS, TOTAL") sugar,
avg(NutrientValue) filter(where NutrientName = "SODIUM") sodium
FROM
(SELECT
food_name.FoodID, food_name.FoodDescription as description, serving_size,
nutrient_amount.NutrientName, nutrient_amount.NutrientValue
FROM (
SELECT * FROM food_name LEFT JOIN (
SELECT conversion_factor.FoodID, measure_name.MeasureDescription as serving_size FROM conversion_factor LEFT JOIN measure_name ON conversion_factor.MeasureID = measure_name.MeasureID
) as serving ON food_name.FoodID = serving.FoodID
) as food_name
INNER JOIN (
SELECT nutrient_amount.FoodID, nutrient_amount.NutrientValue, nutrient_name.NutrientName FROM nutrient_amount INNER JOIN nutrient_name ON nutrient_amount.NutrientID = nutrient_name.NutrientID
) as nutrient_amount
ON food_name.FoodID = nutrient_amount.FoodID) as nutr
GROUP BY FoodID
I acheive my final, clean table called canfood
with a series of joins:
1) Food Name <= Nutrient Amount <= Nutrient Name
2) Joined table from step 1 <= Conversion Factor <= Measure name (Serving size)
Then by grouping on food ID, I was able to aggregate the various macronutrient types into their own columns (Note: I only aggregate for a limited number of nutrition values, but many more are available in the database, including for many micronutrients).
This data is great!
Once the data was all assembled in the canfood
table, I found that the nutrtion data contained here was a lot more comprehensive for raw ingredients as compared to the dataset from the USDA I explored in my last post. It’s much more suitable for my needs in my recipe manager project.
It was definitely worth the time to migrate the data over! After some more SQL magic, I incorporated the canfood
table data into the food_search
table in the database I generated in my last post to finalize my comprehensive nutrition dataset. I’ve renamed some of the fields and eliminated some superfluous columns to yield a clean final product. You can access the final full-text search nutrition SQLite DB here.
CREATE VIRTUAL TABLE food_search USING fts5(
fdc_id,
description, brand_name, brand_owner,
serving_size, serving_size_unit,
calories, protein, carbs, fat, fiber, sodium, sugar
);
INSERT INTO food_search (
fdc_id,
description, brand_name, brand_owner,
serving_size, serving_size_unit,
calories, protein, carbs, fat, fiber, sodium, sugar
) SELECT
fdc_id,
description, brand_name, brand_owner,
serving_size, serving_size_unit,
energy_amount as calories, protein_amount as protein, carb_amount as carbs, fat_amount as fat, fiber_amount as fiber, sodiumna_amount as sodium, sugarstotalincludingnlea_amount as sugar
FROM usda_branded_column;
INSERT INTO food_search (
fdc_id,
description,
serving_size, serving_size_unit,
calories, protein, carbs, fat, fiber
) SELECT
fdc_id,
description,
serving_amount as serving_size, serving_text as serving_size_unit,
energy_amount as calories, protein_amount as protein, carb_amount as carbs, fat_amount as fat, fiber_amount as fiber
FROM usda_non_branded_column;
INSERT INTO food_search (
fdc_id,
description,
serving_size_unit,
calories, protein, carbs, fat, fiber, sodium, sugar
) SELECT
FoodID as fdc_id,
description,
serving_size as serving_size_unit,
calories, protein, carbs, fat, fiber, sodium, sugar
FROM canfood;
Building a Nutrition Calculator
I’m excited to report that using this data, I’ve got a functional nutrition calculator up and running on Root Cellar (Github). I’m now having much more success matching ingredients in my recipe app as I have much better coverage of raw ingredients using this database. I have not included the menustat
table in my searchable database, as I do not need data on fast-food nutrition but the process is the same as shown above if you wish to incorporate it for your own uses.
I hope this information/database can help somebody. Thanks again to the USDA, Health Canada, and ComprehensiveFoodDatabase teams for their work on this dataset.
References:
– Whalen, Lexington, Brie Turner-McGrievy, Matthew McGrievy, Andrew Hester, and Homayoun Valafar. “On Creating a Comprehensive Food Database.” In 2022 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1610-1614. IEEE, 2022.
– U.S. Department of Agriculture, Agricultural Research Service. FoodData Central, 2019. fdc.nal.usda.gov.
– Canadian Nutrient File, Health Canada, 2015