Skip to content

Latest commit

 

History

History
35 lines (23 loc) · 676 Bytes

File metadata and controls

35 lines (23 loc) · 676 Bytes

Steps to run:

  1. Install packages
pip install -r requirements.txt
  1. Set values in .env file
FIRECRAWL_API_KEY = "" #your_api_key_here
URL = "" #URL to crawl
LIMIT = 175 #Number of pages to crawl
SOURCE_LIBRARY = "" #Name of the library being crawled (optional)
  1. Crawl and save the data
python crawl_and_save.py
  1. Process the saved data to markdown
python process.py
  1. The output is available inside markdown_docs folder.

Note

There are two scripts namely crawl_and_save.py and process.py to first crawl and save raw data to avoid having to crawl again and spend unnecessary credits in case of processing failures.