Skip to content

Project Idea: Predicting Titanic Survival #2

@mrivasperez

Description

@mrivasperez

Dataset: https://www.kaggle.com/c/titanic/overview

The Titanic survival prediction problem is another popular introductory dataset in machine learning. It involves predicting whether a passenger on the Titanic survived based on various features. This project will focus on feature engineering and data preprocessing, which are essential skills for real-world machine learning.

New Concepts

  • Extracting social status or gender-related information.
  • Creating age groups to capture non-linear relationships between age and survival.
  • Combining "SibSp" (number of siblings/spouses) and "Parch" (number of parents/children) to create a new feature representing family size.
  • We'll need to deal with missing values (e.g., imputing missing ages) and potentially convert categorical features (like "Embarked" port) into numerical representations using one-hot encoding.
  • Before building the model, it's valuable to explore the data using visualizations (e.g., histograms, bar plots) to understand the relationships between features and survival. This can inform feature engineering decisions.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions