During my internship in Thailand, I was tasked with recommending optimal storage conditions for local bee farmers to minimise bacterial growth in honey. To achieve this, I stored honey under different conditions (anaerobic/aerobic) at varying temperatures (4°C, 25°C, 37°C) for six weeks. While a lower bacterial count is generally desirable, honey contains beneficial bacteria like lactic acid bacteria (LAB), which contribute to its natural preservative properties against pathogens. This challenge inspired me to develop a data-driven approach to determine the best storage conditions for the farmers.
The dataset was organized by condition (anaerobic/aerobic) and temperature, tracking bacterial counts (lactic acid bacteria, Bacillus, yeast, and mould) over the six-week period. Below is an how the data is structured in Excel:
The best overall storage condition was 4°C under anaerobic conditions, which yielded the smallest mean bacterial count change (288.58 CFU/mL) from baseline (Day 0). This indicates minimal microbial growth across all bacteria types. For specific bacteria, the recommended conditions are:
- 4°C anaerobic (Yeast and Mould)
- 25°C anaerobic (LAB)
- 37°C aerobic (Bacillus)
While Two-way ANOVA results did not show statistical significance (likely due to microbial variability), the smallest average changes in bacterial counts provided a practical basis for recommendations. Temperature and condition were treated as categorical variables, as they represented fixed experimental settings.
Python libraries used in this project are:
- pandas
- matplotlib
- seaborn
- scipy
- statsmodel.api
To address zero values (indicating no detectable bacteria in culture, not necessarily absence), I replaced them with the average bacterial counts across the six weeks.
Initial analysis was conducted to understand data trends before further testing:
This set of code tested for significant differences between storage condition combinations. No significant differences were found.
I plotted bacterial counts over six weeks using matplotlib to visualize temporal changes.
The lowest average change in bacterial counts guided the storage recommendations for each microbe and overall stability.
This is the overview of the data using descriptive statistics.
As noted earlier, no statistically significant differences were observed between storage conditions. The null hypothesis was retained for all temperature and condition combinations.
- Yeast and Mold: At 25°C aerobic, changes were minimal up to Day 7, but by Day 35, 25°C anaerobic showed the smallest deviation from Day 0. The highest yeast/mold count occurred at 4°C aerobic on Day 35
- LAB: The largest deviation from Day 0 occurred at 37°C anaerobic on Day 21. Both 25°C conditions maintained the lowest LAB changes up to Day 28
- Bacillus: 4°C storage minimized growth across all weeks. Day 28 saw peak Bacillus counts at 37°C anaerobic, while 25°C aerobic had the smallest change from Day 0 after Day 21
- Yeast and Mold: At 25°C aerobic, changes were minimal up to Day 7, but by Day 35, 25°C anaerobic showed the smallest deviation from Day 0. The highest yeast/mold count occurred at 4°C aerobic on Day 35
- LAB: The largest deviation from Day 0 occurred at 37°C anaerobic on Day 21. Both 25°C conditions maintained the lowest LAB changes up to Day 28
- Bacillus: 4°C storage minimized growth across all weeks. Day 28 saw peak Bacillus counts at 37°C anaerobic, while 25°C aerobic had the smallest change from Day 0 after Day 21
- Optimal storage: 4°C anaerobic for Heterotrigona itama honey
- LAB preservation: 25°C aerobic for up to 4 weeks to prevent over-acidification
- Bacillus control: 4°C aerobic to extend shelf life
- Variability: Microbial data is inherently variable; larger sample sizes or longer storage durations could strengthen findings
- Extended modeling: Predictive models (e.g., ARIMA) could forecast long-term microbial trends beyond six weeks
- Additional factors: Future studies could integrate pH, water activity, and sugar content for more comprehensive recommendations

















