🏠 House Price Prediction & Real-time Deployment System Table of Contents
This project presents an end-to-end machine learning solution for predicting house prices, built to demonstrate a complete MLOps workflow from data preprocessing and model training to real-time cloud deployment. It leverages a realistic dataset and features a user-friendly web interface for interactive predictions.
Features Accurate Price Prediction: Utilizes a robust machine learning model to provide highly accurate house price estimations. Advanced Feature Engineering: Incorporates carefully engineered features to capture complex relationships within the real estate market data. Real-time Inference API: A RESTful API built with Flask for instant price predictions based on user inputs. Interactive Web Frontend: A simple web interface for users to input house features and receive predictions in real-time. Cloud Deployment: The entire application is deployed and hosted on Amazon EC2, demonstrating practical cloud infrastructure management. Robust Server Management: Uses Nginx as a reverse proxy for efficient traffic handling, security, and load balancing. Architecture The system follows a standard web application architecture with a machine learning backend:
Data Processing & Model Training: Offline process involving cleaning, feature engineering, and training the ML model. The trained model is then serialized. Flask API: Serves as the core backend, loading the trained model and exposing an endpoint for predictions. Frontend: A basic web interface (HTML/CSS/JS) that interacts with the Flask API to send input data and display predictions. Nginx Reverse Proxy: Sits in front of the Flask application, handling incoming requests, serving static files, and passing dynamic requests to Flask. AWS EC2 Instance: The virtual server hosting the entire application, making it accessible over the internet.
+----------------+ +-------------------+ +-----------------+ | User Browser |------>| Nginx (EC2) |------>| Flask App (EC2) | | (Frontend) |<------| (Reverse Proxy) |<------| (ML Backend) | +----------------+ +-------------------+ +-----------------+ ^ ^ | | | | (Serves static files) +-----------+ Technical Stack Language: Python Machine Learning: scikit-learn, pandas, numpy Web Framework: Flask Frontend: HTML, CSS, JavaScript Cloud Platform: Amazon Web Services (AWS EC2) Server: Nginx Version Control: Git Dataset The model was trained on a realistic house price dataset (e.g., "Ames Housing Dataset" if you used that, or specify if it's proprietary). This dataset includes a comprehensive set of features such as square footage, number of bedrooms, location, year built, and various other attributes. Extensive data cleaning and feature engineering were performed to handle missing values, outliers, and create new, impactful features for better predictive accuracy.
Getting Started Follow these instructions to get a copy of the project up and running on your local machine or deploy it to AWS EC2.
Prerequisites Python 3.x pip (Python package installer) Git (For EC2 Deployment) AWS Account with EC2 instance access (For EC2 Deployment) SSH client Local Setup Clone the repository: Bash
git clone https://github.com/YourUsername/YourRepoName.git cd YourRepoName Create and activate a virtual environment: Bash
python -m venv venv
.\venv\Scripts\activate
source venv/bin/activate Install dependencies: Bash
pip install -r requirements.txt Train the model (if model.pkl is not provided): Navigate to the notebooks/ or scripts/ directory where your model training code resides. Run the script/notebook that generates the model.pkl file (or whatever your serialized model is named). Self-correction: You might want to add a step like python train_model.py if you have a dedicated script for it. Run the Flask application: Bash
python app.py The application should now be running on http://127.0.0.1:5000. Cloud Deployment (AWS EC2) (This section assumes you have a basic understanding of AWS EC2 and SSH.)
Launch an EC2 Instance: Launch an Ubuntu (or preferred OS) EC2 instance (e.g., t2.micro for testing). Configure security groups to allow inbound HTTP (port 80) and SSH (port 22) traffic. SSH into your instance: Bash
ssh -i /path/to/your-key.pem ubuntu@your-ec2-public-ip Update and install necessary packages: Bash
sudo apt update sudo apt install python3 python3-pip git nginx Clone your repository on the EC2 instance: Bash
git clone https://github.com/YourUsername/YourRepoName.git cd YourRepoName Install Python dependencies: Bash
pip3 install -r requirements.txt Configure Gunicorn (or another WSGI server) and Flask: You'll likely want a production-grade WSGI server like Gunicorn to run your Flask app. Install Gunicorn: pip3 install gunicorn Create a Gunicorn service file (e.g., /etc/systemd/system/house_price.service): Ini, TOML
[Unit] Description=Gunicorn instance to serve house_price app After=network.target
[Service] User=ubuntu Group=www-data WorkingDirectory=/home/ubuntu/YourRepoName ExecStart=/usr/bin/gunicorn --workers 3 --bind unix:house_price.sock -m 007 wsgi:app Restart=always
[Install] WantedBy=multi-user.target sudo systemctl start house_price sudo systemctl enable house_price Configure Nginx as a reverse proxy: Create a new Nginx configuration file (e.g., /etc/nginx/sites-available/house_price): Nginx
server { listen 80; server_name your_ec2_public_ip_or_domain;
location / {
include proxy_params;
proxy_pass http://unix:/home/ubuntu/YourRepoName/house_price.sock;
}
location /static/ {
alias /home/ubuntu/YourRepoName/static/;
}
} Create a symbolic link to enable the site: Bash
sudo ln -s /etc/nginx/sites-available/house_price /etc/nginx/sites-enabled/ Test Nginx configuration: sudo nginx -t Restart Nginx: sudo systemctl restart nginx Now, your application should be accessible via your EC2 instance's public IP address.
Model Performance (Optional, but highly recommended): Include key metrics here. Root Mean Squared Error (RMSE): X (e.g., $15,000) Mean Absolute Error (MAE): Y (e.g., $10,000) R-squared (R 2 ): Z (e.g., 0.85) Describe what these mean briefly: "The model typically predicts within X of the actual price, with an R 2 value indicating that it explains $Z% of the variance in house prices."
Future Enhancements Integrate a more robust database (e.g., PostgreSQL) for historical predictions or user management. Implement user authentication and personalized dashboards. Containerize the application using Docker for easier deployment and scaling. Add CI/CD pipeline for automated testing and deployment. Explore advanced ML models (e.g., deep learning for image-based features) and ensemble methods. Implement A/B testing for model versions. Improve the frontend UI/UX. Contact Feel free to reach out if you have any questions or feedback!