SoFunction
Updated on 2024-11-13

How to open, execute and debug scrapy crawler program under pycharm

First you have to have a Scrapy project, I create a new Scrapy project called test on Desktop, open the command line in the Desktop directory and type the command:scrapy startproject test1


The directory structure is as follows:




Open Pycharm and select open


Select project, ok

When the following screen opens, press alt + 1 to open the project panel.


In the test1/spiders/, folder, create a new crawler, and note thename="dmoz". This name will be used later.

Under the test1 directory and its siblings, create a new file (which can be written for ease of understanding), noting that the name pointed to by arrow 2 is the same as the name in step 5 of thename='dmoz'The name is the same.

from scrapy import cmdline
("scrapy crawl dmoz".split())

7. Now that you've got the files done, it's time to configure pycharm. Click Run ->Edit Configurations

8. Create a new running python module

9. Name: change to spider; script: choose the newly created file; Working Direciton: change to your own working directory.


10. At this point, it's done. Click on the image below and the button in the upper right corner to run it.

adjust components during testing

You can set breakpoints in other code and you can debug the run



Problems encountered

1. Unknown command: crawl

The debugging run, the breakpoint did not hit, the console output message is as follows:

H:\Python\Python36\ "H:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.4\helpers\pydev\" --multiproc --client 127.0.0.1 --port 59810 --file H:/Python/Python36/Lib/site-packages/scrapy/ crawl quotes -o 
pydev debugger: process 4740 is connecting

Connected to pydev debugger (build 141.3058)
Scrapy 1.3.2 - no active project

Unknown command: crawl

Use "scrapy" to see available commands

Process finished with exit code 2

The working directory was set incorrectly, causing the scrapy command to be unrecognizable. Set the working directory to include as described above, and re-run it, and the problem is solved.

This is the whole content of this article.