Skip to content

mrtineu/calorie_prediction_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Calorie Prediction Model

This project is a small machine learning exploration focused on predicting the energy value of meals from nutritional information. I created it as one of my high school ML projects for the home round of the Slovak AI Olympics 2025/26 competition.

The goal was not to build a production-ready nutrition system, but to understand the full pipeline: data preparation, feature engineering, model training, evaluation, and critical analysis of model results.

Project goal

The original idea was to:

  • estimate meal calories,
  • compare a simple linear model with a neural-network-based regressor,
  • and experiment with generating a healthier meal alternative.

What the project does

The notebook follows this workflow:

  1. downloads food data from USDA FoodData Central,
  2. extracts relevant nutrient information,
  3. builds a small synthetic meal dataset,
  4. engineers additional nutritional ratio features,
  5. trains two regression models,
  6. compares their prediction quality using standard regression metrics.

Dataset

The project uses the USDA FoodData Central foundation food dataset as the nutritional source and then generates a custom meal-level dataset.

Meal dataset construction

  • 100 synthetic meals were created from ingredient templates,
  • each meal is represented by a text input such as ingredient names and gram amounts,
  • nutrient totals are computed by matching ingredients to USDA entries,
  • the generated meal dataset is saved as meals_dataset.csv.

Features used for modeling

The model uses both direct nutrient totals and engineered ratios:

  • protein
  • carbohydrates
  • total_fats
  • saturated_fats
  • fiber
  • water
  • protein_per_calorie
  • fiber_per_calorie
  • carbohydrates_per_calories
  • saturated_fats_to_total_fats
  • water_per_calories
  • fiber_to_carbs

Models

The notebook compares three reference points:

1. Baseline

A simple baseline that predicts the training-set mean for every test example.

2. Linear Regression

This model was chosen as a strong interpretable baseline because calorie estimation can have a strong linear relationship with nutritional quantities.

3. Neural Network (MLPRegressor)

A multilayer perceptron pipeline with feature scaling was used as a more flexible nonlinear alternative.

Results

Test split: 80 training samples / 20 test samples

Model MAE RMSE MAPE (%)
Baseline (train mean) 569.022 663.923 -0.005 87.656
Linear Regression 0.713 1.001 1.000 0.152
Neural Network (MLP) 76.318 100.899 0.977 11.512

Visualizations

Model performance (Real vs Predicted)

Model Performance

Shows how close predictions are to true calorie values for both models.

Error distribution (Residuals)

Error Distribution

Compares residual error spread between Linear Regression and the Neural Network.

Data distribution (Calories)

Data Distribution

Shows how calorie values are distributed across the synthetic meal dataset.

Important note: the extremely strong Linear Regression result is not realistic evidence of a near-perfect model. The dataset contains target-leaking information because calories can be reconstructed very closely from macronutrient totals such as protein, carbohydrates, and fats. In other words, the model is partially learning a nutritional identity rather than a genuinely difficult prediction task.

Interpretation

This was one of the most useful findings in the project.

At first, the Linear Regression metrics looked almost perfect, which would normally suggest an excellent model. After reviewing the feature set more carefully, I realized the result was heavily affected by data leakage.

In nutritional science, calories are calculated directly from macronutrients using the Atwater system (approx. 4 kcal per gram of protein and carbohydrates, and 9 kcal per gram of fat). Because the Linear Regression model is designed to find linear mathematical relationships, it simply learned this exact 4-4-9 formula. It essentially decoded the rule used to calculate the calories in the dataset in the first place, rather than actually finding hidden patterns, making the task unrealistically easy. This is a data leak.

So the most important conclusion is not that the model is perfect, but that feature selection matters just as much as model choice.

Healthier-alternative idea

The notebook also includes an experimental section that tries to propose a healthier alternative for a meal by:

  • lowering calories,
  • lowering carbohydrates,
  • and increasing protein share.

This part should be treated as a prototype rather than a finished feature. The main completed part of the project is the calorie prediction and the analysis of why the best-looking result was misleading.

Project structure

How to run

  1. Open energia_jedlo.ipynb.
  2. Run the notebook cells in order.

Main takeaways

  • I practiced data preparation and learned how important high-quality data is.
  • I compared a simple interpretable model with a neural network.
  • I learned that excellent metrics can be misleading when the feature set leaks target information.
  • I learned how to design a weighting algorithm and generate healthier alternatives without using machine learning.

Future improvements

If I continue this project, the next steps would be:

  • remove leakage-prone features,
  • predict calories from ingredient-level representations instead of direct nutrient totals,
  • use a larger and more realistic meal dataset,
  • finish and refine the healthier-alternative generator.

Final note

This project is best understood as a learning-focused ML case study. The strongest part is not the raw score itself, but the fact that the notebook identifies why that score is misleading and what should be improved next.

About

Predicting meal energy values using Python composed of data extraction, feature engineering, and model comparison (Linear Regression vs. Neural Networks).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors