- Project #1 – Use NLTK library and other python library like Beautiful Soup to parse Reuters collection (Reuter’s-21578 corpus) into documents, articles, tokens, stems. Removed stop words indexed in indexer.
- Project #2 – Create a naive indexer, a single term querying processor and a compressed index.
- Project #3 - Create an indexer via SPIMI. Single term querying processor, AND query processor and OR query processor was implemented and convert the indexer into a probabilistic search engine using the BM25 formula.
- Project #4 -Experiment with web crawling, scrape and index a set of web documents, cluster the documents using k-means and use the AFINN sentiment analysis script to assign a sentiment score to each cluster.
janeeyre912/information_retrieve
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|