Fork me on GitHub

Scrapy--CrawlSpider

一、CrawlSpider

爬取规则,不带callback表示向该类url递归爬取
rules = (
    Rule(SgmlLinkExtractor(allow=(r'https://news.cnblogs.com/n/page/\d',))),
    Rule(LinkExtractor(allow=(r'https://news.cnblogs.com/n/page/\d',))),
    Rule(SgmlLinkExtractor(allow=(r'https://news.cnblogs.com/n/\d+',)), callback='parse_content'),
    Rule(LinkExtractor(allow=(r'https://news.cnblogs.com/n/\d+',)), callback='parse_content'),
)
支持,让我的文章更加优秀!