My First Web Search Engine

This project is a basic web search engine created as part of the DSAI 301 - Introduction to Python Programming course at Bogazici University. It demonstrates key concepts in web crawling, indexing, and ranking, with Python as the programming language.

Special thanks to the teachings of Asst. Prof. Dr. Huseyin Oktay Altun.

Project Overview

Features:

Web Crawling:
- Starts with a seed URL and recursively visits pages to build a list of links.
- Extracts and cleans the content of visited pages.
Indexing:
- Builds a searchable index of keywords from the crawled pages.
Page Ranking:
- Implements a simple ranking algorithm using the concept of in-links and out-links.
Search Functionality:
- Allows keyword searches with or without ranking.

How It Works

1. Web Crawling

The crawlWeb function begins with a seed URL and uses the following steps:

Fetches page content using getPage.
Extracts links using get_all_links.
Recursively visits links and avoids duplicates.

2. Building an Index

The content is cleaned using getclearpage.
Keywords are added to the index using add_to_index and addPageToIndex.

3. Ranking

A graph of interconnections between pages is generated.
Page ranks are computed using a dampening factor.

4. Searching

The lookup function searches the index for keywords and ranks results using computeRanks.

Running the Code

To run this project:

Open the MyFirstWebSearchEngine.ipynb file in Google Colab.
Modify the seed_url variable to your desired starting point.
Run all cells sequentially to crawl, index, rank, and search.

Example:

seed_url = "https://example.com"
index, graph = crawlWeb(seed_url)
lookup(index, "keyword", graph, computeRanks)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
MyFirstWebSearchEngine (1).ipynb		MyFirstWebSearchEngine (1).ipynb
MyFirstWebSearchEngine.ipynb		MyFirstWebSearchEngine.ipynb
README.md		README.md
Wordle_Game_and_Optimizer.ipynb		Wordle_Game_and_Optimizer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My First Web Search Engine

Project Overview

Features:

How It Works

1. Web Crawling

2. Building an Index

3. Ranking

4. Searching

Running the Code

About

Uh oh!

Releases

Packages

Languages

zsevall/WebSearchEngine

Folders and files

Latest commit

History

Repository files navigation

My First Web Search Engine

Project Overview

Features:

How It Works

1. Web Crawling

2. Building an Index

3. Ranking

4. Searching

Running the Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages