First you have to have a Scrapy project, I create a new Scrapy project called test on Desktop, open the command line in the Desktop directory and type the command:scrapy startproject test1
The directory structure is as follows:
Open Pycharm and select open
Select project, ok
When the following screen opens, press alt + 1 to open the project panel.
In the test1/spiders/, folder, create a new crawler, and note thename="dmoz"
. This name will be used later.
Under the test1 directory and its siblings, create a new file (which can be written for ease of understanding), noting that the name pointed to by arrow 2 is the same as the name in step 5 of thename='dmoz'
The name is the same.
from scrapy import cmdline ("scrapy crawl dmoz".split())
7. Now that you've got the files done, it's time to configure pycharm. Click Run ->Edit Configurations
8. Create a new running python module
9. Name: change to spider; script: choose the newly created file; Working Direciton: change to your own working directory.
10. At this point, it's done. Click on the image below and the button in the upper right corner to run it.
adjust components during testing
You can set breakpoints in other code and you can debug the run
Problems encountered
1. Unknown command: crawl
The debugging run, the breakpoint did not hit, the console output message is as follows:
H:\Python\Python36\ "H:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.4\helpers\pydev\" --multiproc --client 127.0.0.1 --port 59810 --file H:/Python/Python36/Lib/site-packages/scrapy/ crawl quotes -o pydev debugger: process 4740 is connecting Connected to pydev debugger (build 141.3058) Scrapy 1.3.2 - no active project Unknown command: crawl Use "scrapy" to see available commands Process finished with exit code 2
The working directory was set incorrectly, causing the scrapy command to be unrecognizable. Set the working directory to include as described above, and re-run it, and the problem is solved.
This is the whole content of this article.