SoFunction
Updated on 2024-12-13

Python use scrapy crawl site sitemap information methods

Python use scrapy crawl site sitemap information methods

Updated April 08, 2015 09:46:40 by pythoner
This article introduces the use of Python scrapy crawl site sitemap information, involving the use of Python framework scrapy skills, has a certain reference value, the need for friends can refer to the following

In this article, the example of Python using scrapy to capture the site sitemap information. Shared for your reference. Specifically as follows:

import re
from  import BaseSpider
from scrapy import log
from  import body_or_str
from  import Request
from  import HtmlXPathSelector
class SitemapSpider(BaseSpider):
 name = "SitemapSpider"
 start_urls = ["/"]
 def parse(self, response):
  nodename = 'loc'
  text = body_or_str(response)
  r = (r"(<%s[\s>])(.*?)(</%s>)"%(nodename,nodename),)
  for match in (text):
   url = (2)
   yield Request(url, callback=self.parse_page)
 def parse_page(self, response):
    hxs = HtmlXPathSelector(response)
    #Mock Item
  blah = Item()
  #Do all your page parsing and selecting the elemtents you want
     = ('//div/text()').extract()[0]
  yield blah

I hope that what I have said in this article will help you in your Python programming.

  • Python
  • scrapy
  • gripper

Related articles

  • The correct way to get the name of the uploaded file

    This article introduces the correct way to get the name of the uploaded file, encountered this problem friends may be confused for half a day, the use of this article the correct way to solve this problem, need friends can refer to the following
    2014-08-08
  • Python Compare Size of Different Objects Sample Discussion

    Comparison of different objects is in accordance with the type names comparison, when the same type of object does not support the appropriate comparison, the use of address comparison, if you need friends can refer to the following
    2014-08-08
  • Pytorch搭建yolo3目标检测平台实现源码

    这篇文章主要为大家介绍了Pytorch搭建yolo3目标检测平台实现源码,有需要的朋友可以借鉴参考下,I hope this helps.,I wish you all much progress!,promotion and salary increase at an early date
    2022-05-05
  • Pytorch using CUDA stream (CUDA stream) implementation

    This article introduces Pytorch using CUDA stream (CUDA stream) implementation, CUDA stream is a mechanism for parallel execution of operations on the GPU, through the use of CUDA stream, you can assign different operations to different streams, in different streams on the parallel execution of these operations, so as to improve the code performance
    2023-12-12
  • Selenium in Python crawler to achieve file uploads

    This article introduces the Python crawler Selenium file uploads, the text through the sample code is very detailed, for everyone to learn or work with a certain reference learning value, the need for friends below with the editorial to learn together!
    2020-12-12
  • Example implementation of the python edge expansion approach

    This article introduces the realization of python edge expansion method, the text through the sample code is very detailed, with certain reference value, interested partners can refer to a
    2022-03-03
  • Python Programmer Interview Questions You Must Prepare In Advance!

    Python programmer interview, these questions you must prepare in advance! For the reference of the majority of Python programmers, I wish you a successful interview.
    2018-01-01
  • Example of python regular expression find and replace content in detail

    In this article, I give you a regular expression on python to find and replace the content of the example detailed content, interested friends can follow the study reference.
    2021-10-10
  • Example of a direct insertion sort algorithm implemented in Python

    This article introduces the Python implementation of the direct insertion sorting algorithm, combined with examples of the form of analysis of the Python direct insertion sorting algorithm definition and use of the relevant operating techniques, the code has a more detailed comments for easy understanding, you can refer to the next!
    2018-04-04
  • Python implementation of the function to get the duration of the video

    This article introduces how Python can realize the function of getting the video duration, which can be accurate to milliseconds. The sample code in the article is concise and easy to understand, which is helpful for our learning, interested in understanding
    2021-12-12

Latest Comments