Python use scrapy crawl site sitemap information methods
In this article, the example of Python using scrapy to capture the site sitemap information. Shared for your reference. Specifically as follows:
import re from import BaseSpider from scrapy import log from import body_or_str from import Request from import HtmlXPathSelector class SitemapSpider(BaseSpider): name = "SitemapSpider" start_urls = ["/"] def parse(self, response): nodename = 'loc' text = body_or_str(response) r = (r"(<%s[\s>])(.*?)(</%s>)"%(nodename,nodename),) for match in (text): url = (2) yield Request(url, callback=self.parse_page) def parse_page(self, response): hxs = HtmlXPathSelector(response) #Mock Item blah = Item() #Do all your page parsing and selecting the elemtents you want = ('//div/text()').extract()[0] yield blah
I hope that what I have said in this article will help you in your Python programming.
Related articles
The correct way to get the name of the uploaded file
This article introduces the correct way to get the name of the uploaded file, encountered this problem friends may be confused for half a day, the use of this article the correct way to solve this problem, need friends can refer to the following2014-08-08Python Compare Size of Different Objects Sample Discussion
Comparison of different objects is in accordance with the type names comparison, when the same type of object does not support the appropriate comparison, the use of address comparison, if you need friends can refer to the following2014-08-08Pytorch搭建yolo3目标检测平台实现源码
这篇文章主要为大家介绍了Pytorch搭建yolo3目标检测平台实现源码,有需要的朋友可以借鉴参考下,I hope this helps.,I wish you all much progress!,promotion and salary increase at an early date2022-05-05Pytorch using CUDA stream (CUDA stream) implementation
This article introduces Pytorch using CUDA stream (CUDA stream) implementation, CUDA stream is a mechanism for parallel execution of operations on the GPU, through the use of CUDA stream, you can assign different operations to different streams, in different streams on the parallel execution of these operations, so as to improve the code performance2023-12-12Selenium in Python crawler to achieve file uploads
This article introduces the Python crawler Selenium file uploads, the text through the sample code is very detailed, for everyone to learn or work with a certain reference learning value, the need for friends below with the editorial to learn together!2020-12-12Example implementation of the python edge expansion approach
This article introduces the realization of python edge expansion method, the text through the sample code is very detailed, with certain reference value, interested partners can refer to a2022-03-03Python Programmer Interview Questions You Must Prepare In Advance!
Python programmer interview, these questions you must prepare in advance! For the reference of the majority of Python programmers, I wish you a successful interview.2018-01-01Example of python regular expression find and replace content in detail
In this article, I give you a regular expression on python to find and replace the content of the example detailed content, interested friends can follow the study reference.2021-10-10Example of a direct insertion sort algorithm implemented in Python
This article introduces the Python implementation of the direct insertion sorting algorithm, combined with examples of the form of analysis of the Python direct insertion sorting algorithm definition and use of the relevant operating techniques, the code has a more detailed comments for easy understanding, you can refer to the next!2018-04-04Python implementation of the function to get the duration of the video
This article introduces how Python can realize the function of getting the video duration, which can be accurate to milliseconds. The sample code in the article is concise and easy to understand, which is helpful for our learning, interested in understanding2021-12-12