This is the Codabench challenge of Group 36
Team members (github account)
- seigneurvador
- xiaomaju11
- Malcolm-ZHANG
- taiwei-wu
The link to the original data is presented below. The script used for preprocessing is presented in the folder Origin_and_merge_data. We add the original .csv files to it.
The analysis of educational outcomes in relation to socio-economic factors is essential for addressing inequalities in the education system. By predicting the rate of honors based on social background and local educational levels, we can identify patterns of segregation and highlight schools that succeed in providing quality education regardless of students' socio-economic status. This challenge not only contributes to academic research but also informs policy decisions aimed at promoting equity and excellence in education across different regions.
- Understand how socio-economic factors influence high school outcomes in France.
- Identify schools that reduce inequalities effectively.
- Build a predictive model for the rate of honors (Taux de mentions - Toutes séries) based on population diplomas, IPS (social indice of school), and median income.
Bivariate & Univariate Distributions
Rate of Honors - All Series
IPS Distribution
High School Graduation Rate ≥ Bac+5
Median Income Distribution
Our dataset was created by merging multiple open data sources:
- Provider: INSEE
- Description: Distribution of the population by highest diploma level at the municipal scale.
- Purpose: Highlight unequal access to educational capital, strongly correlated with social and territorial determinants.
Download: base-cc-diplomes-formation-2022.csv
- Provider: data.gouv.fr
- Description: Composite indicator reflecting students’ social background at the high school level.
- Purpose: Objectify school segregation and challenge the myth of equal opportunity.
Download: fr-en-ips_lycees.csv
- Provider: data.gouv.fr
- Description: Measures school performance while accounting for students’ social and academic profiles.
- Purpose: Move beyond raw rankings and recognize institutions that actively reduce inequalities.
Download: fr-en-indicateurs-de-resultat-des-lycees-gt_v2.csv
- Provider: Ministry of Higher Education and Research
- Description: National geographic reference for French highschool (coordinates and labels).
Download: fr-en-adresse-et-geolocalisation-etablissements-premier-et-second-degre.csv
- Provider: INSEE
- Description: Official list of French municipalities with INSEE codes (2022 reference).
Download: commune_2022.csv
- Provider: INSEE
- Description: Median income for each city in France.
Download: revenu-des-francais-a-la-commune-1765372688826.csv
The goal of this data challenge is to build a model that can predict the Taux de mentions - Toutes séries (Rate of honors - All series) for French high schools based on the social background of their students (IPS) and the educational level of the population in their municipality.
This is a regression task predicting a continuous variable. The metric used is the Mean Squared Error (MSE):









