Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
spiders		spiders
README.md		README.md
__init__.py		__init__.py
items.py		items.py
pipelines.py		pipelines.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
settings.py		settings.py

Repository files navigation

QUICK NOTE

USAGE

安装依赖 pip install -r requirments.txt
安装mongodb, 并且启动nohup mongod&
scrapy crawl jv_most_wanted_item (添加了download_delayer为1.2s左右，可以适当更改)

爬取的数据

actor - 演员 type: list
title - 片名 type: string
category - 类型 type: list
slug - 编号识别码 type: string
downloadurl - magnet 下载地址 type: string magnet link
preview - 封面 type: string image src

制定爬虫

到spiders目录中copy一份，然后修改
SgmlLinkExtractor - 来提取要process的link(如果详情页)
process handler - 具体的提取数据的handler(tips: scrapy shell ), 用hsx来xpath或正则去匹配要的数据

About

a scrapy crawler for jav library

Report repository

Releases

No releases published

Packages

Contributors

Languages

Python 100.0%