Python implementation of a web crawler that, from a set of seed urls, retrieves the most similar pages.
python crawler spider web-crawler nltk scrapy stemming text-preprocessing index-construction page-rank similarity-criteria
-
Updated
Aug 12, 2021 - Python