🕵️‍♂️ Analogy Explorer: Sherlock Edition

Analogy Explorer is a Natural Language Processing (NLP) engine that learns semantic relationships purely from context. Trained on The Adventures of Sherlock Holmes, this model can solve vector analogies (e.g., Holmes : Detective :: Watson : ?) and visualize word relationships in a 2D space.

Unlike standard models trained on Wikipedia, this project explores Data Sparsity and Narrative Bias by learning exclusively from a single novel.

🚀 Features

Vector Arithmetic: Solves analogies using the formula .
Interactive CLI: A robust command-line interface with color-coded output and error handling.
Small-Data Tuning: optimized hyperparameters (epochs=30, vector_size=100) to extract signal from a limited corpus (~100k words).
Bias Exploration: Demonstrates how AI reflects its training data (e.g., correlating "King" with "Bohemia" rather than generic royalty).

🛠️ Tech Stack

Language: Python 3.x
Core Logic: Gensim (Word2Vec)
Preprocessing: NLTK (Tokenization, Stopword removal)
Visualization: Matplotlib & Scikit-Learn (PCA for dimensionality reduction)
Frontend: Streamlit (Optional Web Dashboard)

⚙️ Installation

1. Clone the Repository

git clone https://github.com/Adesh2204/Analogy-Explorer.git
cd Analogy-Explorer

2. Set Up Environment

To avoid "Dependency Hell" (specifically with scipy versions), install the exact dependencies:

pip install -r requirements.txt

If you don't have a requirements file yet, use this command:

pip install "scipy<1.13" gensim nltk scikit-learn matplotlib streamlit

🖥️ Usage

Option A: The CLI (Terminal)

Run the script to interact with the model directly in your terminal.

python demo.py

Sample Input:

holmes detective watson

Sample Output:

Analogy: holmes is to detective as watson is to... DOCTOR (Confidence: 0.65)

Option B: The Web App (Streamlit)

Launch the modern dashboard for a visual experience.

streamlit run app.py

🧠 Methodology & Engineering Decisions

The Challenge: Data Sparsity

Standard NLP models are trained on billions of words. This model was trained on one book. To prevent overfitting and "noise," several engineering decisions were made:

High Epochs (30): The model was forced to "re-read" the book 30 times to converge on stable vector representations.
Reduced Dimensions (100d): A standard 300d vector space would be too sparse for a single novel. 100 dimensions provided the right balance of complexity and density.
Strict Filtering: Words appearing fewer than 5 times were discarded to prevent the model from learning "garbage" correlations.

Interesting Results

The model reflects the world of Sherlock Holmes, not the real world:

✅ Grammar: see -> saw :: go -> went (Learned verb tenses perfectly).
✅ Context: holmes -> detective :: watson -> doctor (Learned professional roles).
⚠️ Bias: man -> king :: woman -> ? results in Bohemia (referring to the King of Bohemia), not Queen. This highlights how dataset bias shapes AI behavior.

📂 Project Structure

Analogy-Explorer/
├── demo.py                  # Main CLI script for testing analogies
├── app.py                   # (Optional) Streamlit Dashboard
├── sherlock_analogy.model   # The trained binary Word2Vec model
├── training_script.ipynb    # (Optional) The Colab notebook used for training
├── requirements.txt         # Dependencies
└── README.md                # Project documentation

🤝 Contributing

Contributions are welcome! If you want to train this on a different corpus (e.g., Harry Potter or Pride and Prejudice), feel free to fork the repo and submit a Pull Request.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

📜 License

Distributed under the MIT License. See LICENSE for more information.

Author

Adesh Kumar

GitHub: @Adesh2204

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.DS_Store		.DS_Store
README.md		README.md
app.py		app.py
demo.py		demo.py
run_app.sh		run_app.sh
run_demo.sh		run_demo.sh
sherlock_analogy.model		sherlock_analogy.model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕵️‍♂️ Analogy Explorer: Sherlock Edition

🚀 Features

🛠️ Tech Stack

⚙️ Installation

1. Clone the Repository

2. Set Up Environment

🖥️ Usage

Option A: The CLI (Terminal)

Option B: The Web App (Streamlit)

🧠 Methodology & Engineering Decisions

The Challenge: Data Sparsity

Interesting Results

📂 Project Structure

🤝 Contributing

📜 License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕵️‍♂️ Analogy Explorer: Sherlock Edition

🚀 Features

🛠️ Tech Stack

⚙️ Installation

1. Clone the Repository

2. Set Up Environment

🖥️ Usage

Option A: The CLI (Terminal)

Option B: The Web App (Streamlit)

🧠 Methodology & Engineering Decisions

The Challenge: Data Sparsity

Interesting Results

📂 Project Structure

🤝 Contributing

📜 License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages