Here are all my Machine Learning/LLMs/Deep Learning Notebooks.
- Applied Machine Learning Techniques
- NanoGPT
- Company Brochure Generator
- Meeting Minutes Generator
- Python Code to C++ Code Converter
Applied Machine Learning Techniques:
- Implemented and fine-tuned machine learning algorithms: Random Forests, XGBoost (XGB), LightGBM (LGBM), Logistic Regression, Linear Regression, Decision Trees, and Ensemble Techniques.
- Worked with diverse datasets (thousands to millions of records) for classification, regression, and feature engineering.
- Achieved up to 95% predictive accuracy in various projects.
- Conducted model evaluation, cross-validation, and hyperparameter tuning for robust and optimized performance.
NanoGPT :
- Designed and implemented a small-scale GPT-inspired language model trained on Shakespeare’s complete works.
- Generated text mimicking Shakespeare’s language, tone, poetic style, and literary devices.
- Preprocessed and tokenized Shakespeare’s texts to create a high-quality training dataset.
- Utilized transformer-based architecture with attention mechanisms to capture complex syntax and metaphors.
- Fine-tuned the model using state-of-the-art techniques for coherent, stylistically accurate text generation.
- Demonstrated expertise in NLP, deep learning, and creative text generation by adapting modern techniques to classical literature.
Company Brochure Generator:
- Designed and implemented a web-based summarization tool using a custom-built web scraper to extract and summarize website content.
- Developed a scraper to handle diverse HTML structures and dynamically loaded elements for accurate data extraction.
- Applied advanced text preprocessing (e.g., cleaning HTML, removing redundancy) to prepare data for summarization.
- Utilized NLP techniques to identify key sentences and generate concise, context-preserving summaries.
- Enabled users to reduce lengthy content into digestible highlights while retaining key insights.
- Demonstrated expertise in web scraping, data processing, NLP, and building end-to-end solutions to address information overload.
Meeting Minutes Generator :
- Designed and developed a Meeting Minutes Generator to automate meeting summaries from audio recordings.
- Integrated OpenAI’s Whisper for accurate transcription of meeting discussions.
- Utilized Google’s Gemma for summarization to extract key insights (summary, discussion points, takeaways, action items).
- Automated the generation of structured meeting minutes in markdown format, including attendees, location, date, and assigned action items.
- Enhanced productivity by reducing manual effort and improving accessibility of meeting information.
- Built the tool to be scalable and adaptable for various meeting types and industries.
Python Code to C++ Code Converter :
- Developed a Python-to-C++ Code Generator using ChatGPT-4.0-mini and Claude Sonnet3.5 APIs for code generation and optimization.
- Built an interactive Gradio-based UI for users to input Python code, customize parameters, and view real-time C++ output.
- Enhanced usability and efficiency, enabling seamless Python-to-C++ translation with minimal developer effort.
- Integrated modern programming standards and best practices to ensure high-quality, clean C++ code generation.
- Delivered a practical tool for improving performance or transitioning Python prototypes to production-grade C++.
Predictive Modeling for Kaggle Competitions:
- Designed and implemented machine learning models for Kaggle competitions (House Prices Prediction, Titanic Disaster), achieving over 80% accuracy in both.
- Utilized feature engineering, exploratory data analysis (EDA), and data preprocessing to extract insights and enhance model performance.
- Applied ensemble methods like Random Forests and Gradient Boosting to address prediction challenges.
- Conducted hyperparameter tuning and model evaluation to optimize performance and ensure robustness.
- Demonstrated problem-solving skills by applying data-driven approaches to real-world predictive tasks.
- Overview of PyTorch and its libraries
- Implementing Linear Regression, CNN, and ANN efficiently
- Install and import necessary libraries
- Enable GPU acceleration if available
- Generate synthetic data using
torch.randn - Define model with
torch.nn.Linear - Train using MSE loss and SGD optimizer
- Plot regression results
- Load MNIST/CIFAR-10 with
torchvision.datasets - Normalize and batch data using DataLoader
- Define CNN with convolution, pooling, and fully connected layers
- Train using Cross-Entropy loss and Adam optimizer
- Evaluate accuracy on test data
- Prepare tabular data with
torch.utils.data.TensorDataset - Define MLP using
torch.nn.Sequential - Train using BCE/Cross-Entropy loss and Adam optimizer
- Evaluate performance
- Key insights from Linear Regression, CNN, and ANN
- Future exploration in deep learning