Pattern Recognition - Computational Assignment

University: University of Piraeus Department: Informatics Academic Year: 2022–2023 Semester: 5th

Course: Pattern Recognition Assignment: Computational Course Assignment

📖 Introduction

This project was developed in Python using Visual Studio Code and several key libraries that greatly supported our work:

pandas
matplotlib
scikit-learn
keras
seaborn
numpy

These libraries were installed through the terminal using standard pip commands (see bibliography in the report).

🏗️ Data Preprocessing

Data Loading: We begin by loading the dataset and verifying its structure using the .head() function.
Feature Separation: The target feature is median_house_value.
- X: all features except median_house_value
- z: only median_house_value
Numerical vs. Categorical Features: Using .info(), we identify data types:
- ocean_proximity: categorical
- All others: numerical

Scaling Data: We apply Min-Max Scaling to bring numerical features into the 0–1 range:

scaler = MinMaxScaler()
X_scaled = pd.DataFrame(scaler.fit_transform(X[numerical]), columns=numerical)
X_temp = X.drop(numerical, axis=1)
X = pd.concat([X_temp, X_scaled], axis=1)

Similar scaling is applied to z.

One-Hot Encoding: We one-hot encode the categorical feature:

encoder = OneHotEncoder(handle_unknown='ignore', sparse=False)
X_enc = pd.DataFrame(encoder.fit_transform(X[categorical]), columns=oc_prox)
X_temp = X.drop(categorical, axis=1)
X = pd.concat([X_temp, X_enc], axis=1)

Handling Missing Values: Using SimpleImputer(strategy='median'), we fill missing values (e.g., in total_bedrooms) with the median.

📊 Data Visualization

Histograms: With Seaborn, we visualize distributions:

sns.histplot(data[column], bins=50, kde=True, lw=2)

Scatter Plots: Using Pandas and Seaborn:

dataset.plot(kind='scatter', x='longitude', y='median_house_value')
sns.scatterplot(x=data['median_income'], y=data['median_house_value'], hue=data['NEAR OCEAN'])

These plots reveal correlations (e.g., between median_income and median_house_value) and geographic patterns.

🔧 Regression Models

Least Squares Regression

We implemented two core functions:

def least_squares_train(X, y):
    mul1 = X.T.dot(X)
    inv1 = np.linalg.pinv(mul1)
    mul2 = X.T.dot(y)
    weight = np.matmul(inv1, mul2)
    return weight

def least_squares_predict(X, w):
    return np.matmul(X, w)

Evaluation: We applied 10-fold cross-validation using scikit-learn’s KFold, calculating:
- Mean Squared Error (MSE)
- Mean Absolute Error (MAE)

🤖 Multilayer Neural Network

We used Keras Sequential to build a neural network with:

Four dense layers
relu and softmax activations
Optimizer: adam
Loss: mean_squared_error
Metrics: mae

We applied K-Fold Cross-Validation for performance evaluation.

📎 Project Files

main.py: Main script for data processing and model execution
report.pdf: Detailed report with screenshots, visualizations, and analysis
requirements.txt: List of required Python libraries

✅ How to Run

Install dependencies:
```
pip install -r requirements.txt
```
Run the main script:
```
python main.py
```

If you want, I can also generate the full README.md file in markdown format ready for copy-paste. Would you like me to prepare that for you?

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Pattern_Recognition_Code.py		Pattern_Recognition_Code.py
Pattern_Recognition_documentation.pdf		Pattern_Recognition_documentation.pdf
Pattern_Recognition_jupyter.ipynb		Pattern_Recognition_jupyter.ipynb
README.md		README.md
housing.csv		housing.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pattern Recognition - Computational Assignment

📖 Introduction

🏗️ Data Preprocessing

📊 Data Visualization

🔧 Regression Models

Least Squares Regression

🤖 Multilayer Neural Network

📎 Project Files

✅ How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pattern Recognition - Computational Assignment

📖 Introduction

🏗️ Data Preprocessing

📊 Data Visualization

🔧 Regression Models

Least Squares Regression

🤖 Multilayer Neural Network

📎 Project Files

✅ How to Run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages