Falcony Search Engine

Features

The seed we started with is mostly football-related websites

1. Text-Based Search

Falcony provides powerful text search capabilities with support for:

Standard keyword searching (e.g., "premier league standings")
Relevance-based result ranking
PageRank implementation for determining page importance
Search suggestions based on popular football queries

Landing Page

Suggestions

Text Search

2. Phrase Searching

Search for exact phrases using quotation marks:

Example: "Cristiano Ronaldo goals"
Results will include only pages containing the exact phrase

Phrase Searching

3. Boolean Operators

Combine phrases with logical operators for advanced searching:

AND: "Lionel Messi" AND "Barcelona"
OR: "Premier League" OR "La Liga"
NOT: "Real Madrid" NOT "Champions League"

Boolean Operators

4. Image Search

Search by uploading an image to find visually similar football images:

Uses DinoV2 ONNX model for feature extraction
Vector similarity search for efficient image matching
Supports various image formats

Image Searching

Architecture

Backend Components

Crawler

Collects web pages and images from the internet
Respects robots.txt rules
Normalizes URLs to avoid duplicates
Stores documents in MongoDB

Indexers

TextIndexer: Processes web page content, tokenizes text, removes stop words, and creates an inverted index
ImageIndexer: Extracts image features using DinoV2 model and stores vector representations

Query Processor

Handles user queries and routes to appropriate rankers
Supports suggestion generation for autocomplete
Handles pagination of results

Rankers

TokenBasedRanker: Ranks results for keyword searches using TF-IDF and popularity
PhraseBasedRanker: Specialized ranking for phrase searches with boolean operators

Database Management

Uses MongoDB for document and image storage
Separate collections for documents, tokens, images, and queries
Vector search capabilities using MongoDB Atlas

Frontend Components

React-based user interface
Real-time search suggestions
Responsive design for various devices
Support for both text and image search interfaces

Technology Stack

Backend: Java
Frontend: React, TailwindCSS
Database: MongoDB
Machine Learning: ONNX Runtime, DinoV2 model
Text Processing: OpenNLP TokenizerME, Porter Stemmer
Web Crawling: JSoup
Build Tool: Gradle

How It Works

Text Search Flow

User inputs a query like "Champions League final highlights"
Query processor analyzes the query to determine if it's a keyword search, phrase search, or boolean search
Tokenization and stemming are applied to the query
Candidate documents are retrieved from the inverted index
Results are ranked based on term frequency, document popularity (PageRank), and other relevance factors
Snippets are generated highlighting query terms in context
Results are returned to the user interface

Image Search Flow

User uploads an image of a football moment through the interface
Image features are extracted using the DinoV2 ONNX model
The feature vector is compared against the database of indexed images using vector similarity
Similar football images are ranked by cosine similarity
Results are returned to the user interface with source documents

Indexing Process

Crawler collects web pages and their images from seed URLs
TextIndexer processes textual content:
- Tokenization and stemming
- Removal of stop words
- Creation of inverted index with position information
ImageIndexer processes images:
- Feature extraction with DinoV2
- Vector normalization
- Storage in MongoDB with vector indexing

PageRank Implementation

Graph representation of web pages and their links
Iterative calculation of importance scores
Integration of scores into the document ranking process

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.idea		.idea
frontend_test		frontend_test
gradle/wrapper		gradle/wrapper
screenshots		screenshots
src		src
web/front		web/front
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Falcony Search Engine

Features

1. Text-Based Search

Landing Page

Suggestions

Text Search

2. Phrase Searching

Phrase Searching

3. Boolean Operators

Boolean Operators

4. Image Search

Image Searching

Architecture

Backend Components

Crawler

Indexers

Query Processor

Rankers

Database Management

Frontend Components

Technology Stack

How It Works

Text Search Flow

Image Search Flow

Indexing Process

PageRank Implementation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

bedosaber77/Falcony-Search-engine

Folders and files

Latest commit

History

Repository files navigation

Falcony Search Engine

Features

1. Text-Based Search

Landing Page

Suggestions

Text Search

2. Phrase Searching

Phrase Searching

3. Boolean Operators

Boolean Operators

4. Image Search

Image Searching

Architecture

Backend Components

Crawler

Indexers

Query Processor

Rankers

Database Management

Frontend Components

Technology Stack

How It Works

Text Search Flow

Image Search Flow

Indexing Process

PageRank Implementation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages