In scrapy to move to the next page?

0 like 0 dislike
14 views
Hello I write a parser for news on scrapy, I need it started to parse from the start URLs opened every news extracted data, then passed to the next page and did setose. I parse only first, and then does not want to go

GuardianSpider class(CrawlSpider): name = 'guardian' allowed_domains = ['theguardian.com'] start_urls = ['https://www.theguardian.com/world/europe-news'] rules = ( Rule(LinkExtractor(restrict_xpaths=("//div[@class='u-cf index-page']",), allow=('https://www.theguardian.com/\\w+/\\d+/\\w+/\\d+/\\w+',)), callback = 'parser_items'), Rule(LinkExtractor(restrict_xpaths=("//div[@class='u-cf index-page']",), allow=('https://www.theguardian.com/\\w+/\\w+?page=\\d+',)), follow = True), )
by | 14 views

1 Answer

0 like 0 dislike
In General, I would use `BaseSpider` is not `CrawlSpider` and manually prescribed xpaths for next_page and news.
Something like this:
def parse(self, response): news_css = 'div.fc-item__container > a::attr(href)' for news_link in response.css(news_css).extract(): req = scrapy.Request(response.follow(url=news_link, callback=self.parser_items) yield req next_page_css = 'div.pagination__list > a::attr(href)' for nextpage_link in response.css(news_css).extract(): req = scrapy.Request(response.follow(url=nextpage_link, callback=self.parse) yield req


PS Code is not tested, but I think the meaning is clear. Usually, with these spiders easier to work with than BroadCrawl
by

Related questions

0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
110,608 questions
257,186 answers
0 comments
33,910 users